Biological features of cancer proteins
时间:2009-06-10 13:36 来源:生命在线

Biological features of cancer proteins

癌症蛋白质的生物特征

There are now over 2,600 classes of protein domain reported on Pfam (see online links box) - a manually curated database of protein-domain families - that are encoded by genes in the human genome37. Of these, we found 221 in proteins that are encoded by cancer genes. We compared Pfam domains that are encoded by cancer genes to Pfam domains that are encoded by the complete human gene set (see supplementary information S2 (table)). The analysis has also been applied to subgroups of the cancer-gene list (see supplementary information S3,S4 (tables)). Compared with their representation in proteins that are encoded by the complete human-gene set, at least 11 Pfam domains are clearly over-represented among proteins that are encoded by cancer genes (see supplementary information S2 (table)). These include the protein-kinase, bromodomain, helix-loop-helix (HLH), homeobox, carboxy-terminal DNA-binding (ETS), PAX, prolyl hydroxylase, MMR, HATPase_c, MYC amino-terminal and AF-4 domains.

The most commonly represented Pfam domain that is encoded by cancer genes is the protein kinase, and this is also the domain for which there is strongest evidence of over-representation - there are 27 cancer genes in the census that encode protein-kinase domains, compared with the 6.3 that would be expected in a random selection of the same number of genes from the complete set of human genes (see supplementary information S2 (table)). Some over-representation of protein-kinase domains might be attributable to ascertainment bias. Protein kinases have long been implicated in oncogenesis and, consequently, mutations in some have been identified because they were examined as plausible candidates. However, mutations in most protein kinases in the cancer-gene census were identified through positional-cloning approaches, so ascertainment bias is unlikely to account completely for their over-representation.

Most of the protein kinases in the cancer-gene census show somatic mutations in cancer, but there are several in which germline mutations cause predisposition to neoplasia, including MET, KIT, STK11, and CDK4. Most cancer genes that encode protein kinases contain activating mutations and are dominant at the cellular level. However, a minority act in a recessive manner at the cellular level. These include ATM, STK11, and BMPR1A, which are all inactivated by mutations23, 38, 39. Mutated protein kinases are particularly strongly over-represented among epithelial neoplasms, but are also found in leukaemias, lymphomas and mesenchymal tumours. Dominantly-acting mutated protein kinases are activated by diverse classes of mutations, including gene amplification, base substitution, in-frame large insertions and deletions (for example, FLT3 and EGFR, respectively)33, 40, in-frame small deletions (for example, KIT, PDGFRA)41, 42 and chromosomal translocation. Tyrosine kinases and serine/threonine kinases are both represented in the cancer-gene census. However, tyrosine kinases are over-represented compared with serine/threonine kinases, accounting for approximately one quarter of all the known protein kinases and two-thirds of the protein kinases that are encoded by cancer genes. Interestingly, phosphatases are not prominent in the cancer-gene census - one tyrosine phosphatase is listed, approximately the expected number in a random selection of genes from the complete set of human genes.

After protein kinases, the most frequently over-represented Pfam domains are those that broadly constitute components of proteins that are implicated in transcriptional regulation. These include HLH, ETS, PAX, homeobox, MYC N-terminal, bromodomain, AF-4 and PHD domains. Many of these (for example, PAX, ETS, AF-4, HLH, bromodomain and MYC N-terminal) are over-represented tenfold or more in the cancer-gene census, compared with the numbers expected from a random selection of human proteins. In contrast to the protein-kinase domain, most domains that are involved in transcriptional regulation are encoded by cancer genes activated by chromosomal translocations in leukaemias, lymphomas and mesenchymal tumours.

The final group of domains that are clearly over-represented among cancer genes is associated with DNA maintenance and repair (MMR and HATPase_c domains). Mutated cancer genes encoding these domains generally act in a recessive manner at the cellular level, are inactivated during oncogenesis (resulting in increased somatic mutation rates) and often have germline mutations that result in cancer predisposition. Indeed, a substantial proportion of germline-mutated genes that cause cancer predisposition are involved in DNA maintenance and repair.

Other Pfam domains that are frequently encoded by cancer genes are not necessarily over-represented in the cancer-gene census. For example, ten cancer genes encode C2H2 zinc-finger domains (which are implicated in DNA binding and transcriptional regulation). However, the C2H2 zinc finger is a common motif and this is the number that would be expected based on a random selection of human genes. Certain Pfam domains are, however, under-represented among cancer genes. For example, only one cancer gene encodes a rhodopsin-like seven-transmembrane domain, compared with nine expected (see supplementary information S2 (table)). Rhodopsin-like seven-transmembrane domains form a large class of G-protein-coupled receptors (GPCRs) that respond to a wide variety of signals. Their under-representation among cancer genes is perhaps surprising, given the over-representation of protein kinases, as both groups of proteins are involved in signal transduction. However, the results indicate that the normal metabolic connections of many GPCRs do not substantially influence the processes of cell proliferation, differentiation and death that underlie neoplastic change.

This census provides a detailed view of cancer-associated genes, the mutations that contribute to tumorigenesis and the functional consequences of the resulting structural abnormalities. Some clear patterns and questions emerge. Even with our relatively conservative inclusion criteria, we find more than 1% of genes in the human genome are involved in oncogenesis. The total number of human cancer genes remains a matter for speculation. For most individual adult epithelial cancers, it is not possible at present to identify the four to seven somatically mutated cancer genes that are usually proposed to be necessary (as a minimum) for cancer development. There also seem to be more cancer genes on the way to being identified by conventional strategies. For example, there are several recurrent copy-number abnormalities found in human cancer for which the target gene has yet to be definitively identified, and there could be many genes with germline sequence variants that confer a small additional risk of cancer (low-penetrance cancer-susceptibility genes). Moreover, positional-cloning strategies (in the past, the most influential approaches to cancer-gene identification) might have completely missed many mutated cancer genes simply because they do not yield informative positional cues. Finally, the full role of promoter methylation in cancer and the number of genes that contribute to oncogenesis when modified in this way is yet to be clarified. So, it is plausible (although unproven), that many more cancer genes remain to be identified. The finished human genome sequence now offers new opportunities for identifying cancer genes. It will be interesting to observe if the current patterns persist, or whether they predominantly reflect the technical opportunities and constraints that have prevailed in the past twenty years of cancer-gene identification.


 
上一篇:General features of cancer genes
下一篇:Role of RNA Polymerase in Gene Transcription Demonstrated
 
责任编辑:古流骏
返回顶部 关闭本页