How to cite this article: Peralta-Rodríguez R, Valdivia A, Mendoza M, Rodríguez J, Marrero D, Paniagua L, Romero P, Taniguchi K, Salcedo M. Genes associated to cancer. Rev Med Inst Mex Seguro Soc. 2015;53 Supl 2:S178-87.
CURRENT THEMES
Received: October 22nd 2014
Accepted: May 15th 2015
Raúl Peralta-Rodríguez,a Alejandra Valdivia,a Mónica Mendoza,a Jade Rodríguez,a Daniel Marrero,a Lucero Paniagua,a Pablo Romero,a Keiko Taniguchi,a Mauricio Salcedoa
aLaboratorio de Oncología Genómica, Unidad de Investigación Médica en Enfermedades Oncológicas, Hospital de Oncología, Instituto Mexicano del Seguro Social, Distrito Federal, México
Communication with: Mauricio Salcedo-Vargas
Telephone: (55) 5627 6900, extensión 22706
Email: maosal89@yahoo.com
In 2010, in a cancer genes census, 291 genes were enumerated. These represent near to the 1 % of the total genes, for which there is enough biological evidence that they belong to a new genes classification, known as the cancer genes. These have been defined as the causal genes for sporadic or familiar cancer, when they mutate. The mutation types for these genes includes amplifications, point mutations, deletions, genomic rearranges, amongst others, which lead to a protein over-expression, muting, production of chimeric proteins or a de novo expression. In conjunction these genomic alterations or those of the genetic expression, when they affect specific genes which contribute to the development of cancer, are denominated as cancer genes. It is possible that the list of these alterations will grow longer due to new strategies being developed, for example, the genomic analysis.
Keywords: Genes, Mutation, HER2/neu, CRBP1.
Just a decade ago the first draft of the human genome was published. It enumerated about 28,000 genes.1,2 No doubt it was a transcendental event achieved thanks to the technological race that had to be developed to complete this draft. The catalog of genes identified about 1% (291 genes) that show sufficient biological evidence that they belong to a new classification of genes, cancer genes. These are defined as genes involved in susceptibility, development, and progression of different cancers when they do not function normally.3
The studies published by Drs. H. Varmus and M. Bishop in the eighties generated the starting point for a classification of genes associated with cancer. Thanks to the knowledge of retroviruses, it was possible to determine that such viruses contain human-like sequences capable of generating tumor growth. Because of this they were dubbed with the term oncogenes (which means active genes). In their human counterpart, it was determined that these sequences were operating normally, so they were called proto-oncogenes. Later it was discovered that these oncogenes could undergo different types of mutations, causing their molecular activation, so they thus became the famous cellular oncogenes. It happened somewhat similarly with the so-called anti-oncogenes, which "supposedly" negatively regulated oncogenes; however, when they suffer genetic damage, such as mutations, in particular point mutation or deletion, these are inactivated and thus lose this molecular brake. Subsequently, given their characteristics, these were considered as the famous tumor suppressor genes.4
For many years there have been considerable efforts to identify specific genes, or "markers", involved in cancer development. Such is the case of BRCA1 and BRCA2 genes, which were described in the nineties as genes for breast and ovarian cancer in hereditary cases and which have now been incorporated into everyday clinical practice in molecular diagnostics in oncology.5 The same happened to the gene ERBB2 (HER2-neu), which was reported in the eighties as a gene for breast cancer.6 Since then, there have been multiple studies that support the biological evidence of overexpression of this gene as a poor prognostic factor in patients with breast cancer; furthermore, immunotherapy has been developed to significantly improve the prognosis in such patients.7 Similarly, in recent years there have been many studies on the identification of new cancer genes; these studies have increasingly been incorporated into the molecular diagnosis of cancer. Such is the case of aberrant expression of CRBP1 gene, which was reported in 2002 as associated with different types of cancer, and which in 2010, with the application of methodologies for genomic analysis, was associated with survival in patients with laryngeal cancer.8,9
In 2010 a survey of cancer genes enumerated 291 human genes representing about 1% of total genes, for which there is sufficient evidence that they are biological causes of sporadic cancer or familial cancer when mutated. The type of mutations for these cancer genes include genomic rearrangements, point mutations, deletions, amplifications, and others (Figure 1).3
Figure 1 Genetic alterations and expression changes in the cancer cell. A shows some of the most frequent alterations in the genetic material of the cancer cell, such as amplification, loss, point mutation, translocation, polymorphisms, methylation, and viral insertion. B shows changes in gene expression of the cancer cell, including: overexpression, silencing, protein mutation, chimeric protein, and protein variation.
For example, for mutations of the amplification type (or gaining extra gene copies) only six cancer genes were identified, i.e., that are altered by a mechanism of gene amplification that results in an overexpression of the genes AKT2, ERBB2, MYC, MYCL1, MYCN, and REL. This brief list is not due to gene amplification being a rare mechanism in cancer, but, on the contrary, in virtually all cytogenetic studies done on cancer cells, amplified regions have been found (formerly known as DM or double minutes). Rather, this short list of amplified genes reflects the difficulty in identifying specific genes associated with cancer, located in amplicons (amplified DNA fragments), mainly because a great quantity of genes are usually located on these. The term gene amplification refers to the increase in copy number somatically acquired from a specific region of the genome and regularly resulting in overexpression of genes located in this region. The amplification mechanism is complex; these types of mechanism include cutting and fusion of chromosome fragments, formation and reinsertion of DM chromosomes, or the formation of clusters of small genomic fragments.10 Gene amplification events include multiple genes from across the whole genome. However, identification of a cancer gene within the amplicon is insufficient only with gene amplification, as evidence of overexpression in tumors that have some specific amplicon is also necessary, as well as the correlation of amplification or overexpression with clinical characteristics of the patients and, in some cases, biological investigation of the function and effectiveness of drugs against these overexpressed proteins. The interpretation of these data can also be difficult if more than one gene may be contributing to the biological effect of an amplicon, or if the identity of the genes that promote the tumor in a genetically defined amplicon is different in different types of cancer.
To identify genes that promote tumor development within an amplicon, it is necessary to analyze whether different types of tumors share what is known as a minimal region of amplification. For example, in the 2p24 amplicon, which is common to all neuroblastomas, the MYCN gene has been identified as a cancer gene acting through a genic amplification mechanism.11 However, one cannot exclude the biological contribution of adjacent genes that are co-amplified; for example, the DDX1 gene on the 2p24 amplicon that may, like MYCN, be a cancer gene in this same amplicon.12
Another clear example of this phenomenon occurs in sporadic breast cancer. This cancer is found on the 11q13 amplicon, containing the genes CCND1, PAK1, and EMSY as cancer genes. Activity of cyclin D1 (CCND1) is required for G1/S transition in the cell cycle, so that with overexpression, cell proliferation is promoted; the EMSY protein suppresses BRCA2 activity, which is crucial for DNA repair; PAK1 kinase regulates cell motility, and in the apoptosis process its activity should be inhibited, so that, when overexpressed, damaged cells cannot perform normal apoptosis. Overexpression of these proteins leads to the cellular transformation that the 11q13 amplicon has in sporadic breast cancer.13
Some molecular tests to specifically identify amplified genes that contribute to the development of cancer are effective when specific drugs are directed against these overexpressed proteins. This has been shown with administration of the antibody trastuzumab for treating metastatic breast cancer in patients overexpressing the ErbB2 receptor. Trastuzumab combined with chemotherapy substantially improves the prognosis of patients compared with chemotherapy alone. The effectiveness of this treatment depends on identifying the amplified ERBB2 gene by immunohistochemistry assay or fluorescence in situ hybridization, and this is how patients are selected for administration of the drug, leading to a more individualized treatment (genomic medicine) (Figure 2).28
Figure 2 Amplification of HER-2/neu in breast cancer. Amplification and overexpression of this gene has allowed the design of targeted antibodies that improve the patient's prognosis
This example demonstrates the importance of identifying cancer genes, because this way it is possible to design targeted treatments that substantially improve the quality of life of cancer patients, for which it is essential not just to identify cancer genes, but to describe their mechanisms of action.
Correlation between amplified genes and the patient's prognosis is not often carried out, mainly due to the difficulty to follow-up. However, such correlations can be used to determine whether an amplification event is involved in the patient’s clinical future. With the introduction of CGH microarrays it is now possible to identify specific genes within the amplicon and to make a better patient prognosis (tiling array). For example, a sample analysis of Wilms’ tumors identified the 1q25.3 amplicon associated with poor prognosis in these patients.22 There are different cytogenetic techniques used for studying chromosomal imbalances or amplified regions. One of these methodologies is comparative genomic hybridization (CGH). This molecular cytogenetic technique was developed in the early nineties and has been widely used for the analysis of chromosomal imbalances in various tumor types. With this methodology the entire genome of the tumor can be analyzed in a single experiment, and changes can be found at the chromosomal level relating to losses, gains, or amplifications of chromosomal material. However, this technique has some problems with its resolution being limited to 10 Mbp (10 million base pairs) and not detecting alterations at the gene level, although this has been solved with the development of a technique called CGH microarray. It does not use metaphases, but rather a formation of clones arranged on glass or silicone surfaces. The resolution of this technique can reach gene level and even into the DNA sequence, and final analysis does not require conventional karyotyping, but rather direct analysis with a special microarray scanner coupled with a computer program.14,15 The development of microarray technology has served as a useful tool for the molecular study of cancer. With this technology it has been possible to identify specific genes that undergo changes in copy number within an amplicon, with which it is now possible to identify possible cancer genes.16
It is noteworthy that the genes that contribute to tumor development may also be overexpressed by different mechanisms in the absence of DNA amplification. One example is the MDM2 gene in human sarcomas. This gene codes for the MDM2 protein, which acts as a negative regulator of p53, so that when overexpressed it inhibits the action of tumor suppressor gene P53.17 Another example is the KIT gene (4q12), which can be activated like MDM2 by point mutations in the tumor of testicular germ cells. This gene encodes a growth factor receptor on progenitor cells, so its overexpression in germ cells leads proliferation.18 Mutations and amplifications represent different mechanisms for the activation of genes, and they are usually two mutually-exclusive events.
Point mutations are widely found in the P53 gene (teacher or guardian gene of the cell cycle, as named by Science magazine in 1993) in many human cancers.19 This gene is located in the 17p13 region and it encodes a nuclear transcription factor that is essential to induce cell response to DNA damage, therefore playing an important role in apoptosis and cell cycle. A defective P53 allows damaged cells to proliferate, resulting in cancer or, otherwise, marking cells for cell death. For example, it has been reported that up to 50% of human cancers have mutations in this gene; such is the case in G> T mutations in codons 157, 158, 245, 248, 249, and 273.21
A special and fortunately rare interest is the inherited point mutations that may predispose to cancer, for example inherited mutations in cyclin-dependent kinase 4 (CDK4). Another example that in many breaks the "one gene, one disease" rule is the classic RET oncogene, associated with sporadic and hereditary medullary thyroid cancer; however, alterations in this gene such as translocations, may also be associated with papillary thyroid cancer. In a tracking study of a family affected by hereditary medullary thyroid cancer, RET C634Y mutation was confirmed, which predisposes to a high penetrance of the disease at an early age (Figure 3).20
Figure 3 Penetrance of the C634Y RET mutation. Tracking a family with inherited mutation in the RET oncogene. The change in a single amino acid predisposes them to medullary thyroid cancer (taken from Beatriz Gonzalez et al.)20
As far as genomic rearrangement mutations, the most common is chromosomal translocation. In this type of mutation, part of two chromosomes exchange positions, which results in production of a chimeric protein, overexpression of one gene, and loss of function of another gene. The widely documented classic example is the Philadelphia chromosome, also called Philadelphia translocation, which is found in up to 95% of cases of chronic myeloid leukemia. This abnormality affects chromosomes 9q34 and 22q11, i.e. the q34 region of chromosome 9 is fused to the q11 region of chromosome 22. This results in the fusion of the BCR gene on chromosome 22 to the ABL gene on chromosome 9 (Figure 4). Since the function of ABL is to attach phosphate groups to tyrosine residues, the resulting fusion of BCR-ABL remains continually active without need for other regulatory proteins, which in turn activates other cell cycle checkpoint proteins in addition to inhibiting DNA repair, causing genomic instability. This phenomenon was discovered and described in 1960 by Nowell and Hungerford, a pair of researchers from Philadelphia (thus the name of the genetic alteration), and later Dr. Janet Rowley identified genetic translocation as the source of the abnormality.26
Figure 4 Translocation and deletion in the cancer cell. A) 9q34-22q11 chromosome translocation in leukemia (Philadelphia chromosome). This translocation leads to cell proliferation. B) DCC gene deletion in colon cancer. The loss of this gene promotes metastasis and invasion. Both types of alterations lead to poor survival prognosis in patients
A deletion mutation is often found in the DCC gene in up to 70% of colon cancer. This gene is located in region 18q21 and encodes a protein with cell adhesion properties, so when altered it increases its capacity for adhesion or invasion (Figure 5). Allelic losses are associated with increased metastasis rate and lower life expectancy.27
Figure 5 Cancer genes. Cancer genes are distributed throughout virtually the entire genome (modified from Santarius et al.)3
There are different criteria used to classify cancer genes. One of the molecular classifications is based on the type of mutations it has: dominant mutations (that is, needing only a single allele to be mutated) or recessive mutations (i.e. requiring both alleles to be mutated). This system classifies genes based on their ability to promote or inhibit cell growth and the type of mutation required for activation or inhibition of these genes. For this reason, genes having the ability to promote uncontrolled cell growth were named oncogenes, once these genes are activated by different types of mutations. Activation of these genes is brought about dominantly; that is, the mutation may affect only one allele to affect expression. The other group of cancer genes is called tumor suppressor genes, which have the ability to inhibit cell growth and are activated by recessive mutations; that is, both alleles must be mutated to inhibit expression of these genes.29
A classification has recently been made of cancer genes into four classes (I-IV) based on the following criteria:
Based on these criteria about 200 genes have been defined within class IV, which essentially correspond to results of genomic studies; 62 genes in class III; 12 genes in class II, and six genes in class I.3
Genomic analyses such as gene expression microarrays, serial analysis of gene expression (SAGE), and CGH microarrays have expanded the number of potential cancer genes; for example, Nikolosky et al. identified 1747 amplified genes, organized into 30 amplicons in samples of breast cancer. From this list, the genes that have been reported to be altered by other mechanisms (point mutations, translocations, etc.) in the same type of cancer were looked for. This study identified nine genes in class III with the criteria before mentioned.30 Given these results, the number of cancer genes is expected to increase in the coming years.
There are two ways of acquiring mutations in cancer genes: somatic mutations, acquired by exposure to environmental factors, and mutations transmitted through the germline, which result in cancer susceptibility. Fortunately, about 90% of cancer genes show somatic mutations, while the rest have germline mutations.31
When somatic mutations occur in genes that regulate proliferation, differentiation, cell death, or DNA repair, the cell is transformed and may lead to cancer. It is known that somatic mutations occur randomly in normal cells; however, most occur as transient mutations or do not confer any clonal growth advantage to the cell, and the repair system also acts so that most of these mutations do not lead to a cancerous phenotype. However, when the somatic mutation rate increases, the likelihood increases that these mutations cannot be repaired or that the repair system is affected, which leads to a transformation of the cell.31
In general, the spectrum of neoplasms that are associated with germline mutations of a particular gene is similar to that reported with somatic mutations. However, there are notable exceptions to this rule; for example, somatic mutations of the TP53 gene are found in more than half of colorectal cancers, where germline mutations do not appear to cause predisposition to this type of cancer. Genes with germline mutations that cause predisposition to cancer show a very low rate of somatic mutation in sporadic cancers, such as BRCA1 and BRCA2 genes in breast cancer. The reason for these differences between tumors with somatic mutations and tumors associated with germline mutations are unknown.31
Originally (in the eighties and nineties) cancer genes were identified by the molecular methodology of positional cloning without any prior hypothesis of their biological function; this method is slow and laborious. With this strategy, cancer genes are located as a small part of the genome and it is determined if they have mutations. The first positional traces were diverse and include rearrangements in chromosomes that are visible at metaphase of neoplastic cells. Change in copy number in the DNA of tumor cells and the susceptibility of genes to cancer have been studied by genetic linkage analysis in families with many cases of cancer.32
Also, cancer genes have been detected by biological assays. The most notable example is the transformation assay of the NIH-3T3 cell line, wherein human DNA that has been transformed is introduced into a mouse fibroblast line. This line incorporates some human cancer genes that are mutated and acquires the transformed phenotype.34 The rest of the gene mutations have been identified through the analysis of potential candidates based on known biological patterns of tumor cells. However, this has only been on a small number of cancer genes.
Determining the potential role of a gene in a cancer is certainly done using various methodologies. Thus, some methodologies are used to elucidate the biological importance of specific genes that are important for tumor development. Short interfering RNA (siRNA) is used to block the expression of genes amplified and overexpressed in cell lines.23 Such experiments provide the functional results of the genes involved in cancer development in vitro, so they tend to be carefully interpreted. SiRNA studies in breast cancer cell lines with the 17q12 amplicon showed that the ERBB2 gene and the adjacent co-amplified genes G GRB7 and STARD3 contribute in the same way to the biological effect of this amplicon.24 Analogous in vitro experiments, using drugs specifically targeting amplified gene products, have demonstrated for the case of the MET gene that they only inhibit the growth of gastric cancer cell lines containing the 7q31 amplicon, where this gene is found.25
This problem could further explain the "bottleneck" in defining the association of a gene with cancer. That is, it is becoming ever more evident that there is no specific marker in cancer. There are exceptions, for example, the Philadelphia chromosome in chronic myeloid leukemia type (95% of cases), the RET oncogene in medullary thyroid cancer (100%), and the DCC suppressor gene in colorectal cancer (70%). Another possible explanation is that current methods are not as effective to define a new cancer gene.26,27
There are currently more than 2600 kinds of protein domains reported in Pfam (a database of protein domain families) that are encoded by genes in the human genome. Of these domains, 221 are proteins encoded by cancer genes. Further analysis revealed that at least 11 domains in the Pfam database are clearly overexpressed in the proteins encoded by cancer genes. These include the domains of protein kinases, the bromo domain, double helix loop (helix-loop-helix), homeobox, carboxyl-terminus DNA binding protein, PAX, proline hydrolase, MMR, ATPase, amino-terminus MYC, and AF-4.33
The most common domain encoded by cancer genes is protein kinase and this is also the domain for which there is growing evidence of overexpression. There are 27 cancer genes encoding these domains, instead of the six that would be expected in a random selection of the same number of genes from the set of human genes. Most of the genes encoding these proteins show somatic mutations in cancer; however, there are also some that have germline mutations and association with cancer, as with the genes MET, KIT, STK11, and CDK4. Another feature of these genes is that they exhibit dominant mutations at the cellular level. However, there is also a minority that act recessively at the cellular level, such as the cancer genes ATM, STK11, and BMPR1A, which are inhibited by mutation. The mutations in these genes are found mainly in epithelial tumors and to a lesser extent leukemias, lymphomas, and mesenchymal tumors. The mutations acting dominantly in these genes are the type for gene amplification, base substitutions, and deletions; for example, in the FTL3 and EGFR genes. The two major protein kinases that are overexpressed in cancer genes are tyrosine kinases and serine-threonine kinases, of which tyrosine kinases are found in a quarter of all known protein kinases, and two-thirds of protein kinases encoded by cancer genes.34
After protein kinases, the most overrepresented Pfam domains are those that are expressed constitutively in the proteins involved in transcriptional regulation, such as the domains HLH, ETS, PAX, homeobox, MYCN, bromodomain, AF-4, and PHD.
The last group of domains that are overexpressed among cancer genes are those associated with DNA maintenance and repair (MMR and ATPase domains). These cancer genes generally act recessively at the cellular level, and are often presented as germline mutations that result in predisposition to cancer. Therefore, a large proportion of mutated genes in the germline causing cancer predisposition are involved in DNA maintenance and repair.
There are other Pfam domains that are often encoded by cancer genes and that are not necessarily overexpressed. For example, 10 cancer genes encode zinc finger domains (C2H2), which are involved in DNA binding and transcriptional regulation. However, zinc fingers are a common motif and this number would be expected in a random selection. Certain Pfam domains are underrepresented among cancer genes, for example, only one cancer gene encodes a transmembrane domain like rhodopsin, which is part of a large family of G protein-coupled receptors (GPCR), which respond to a variety of signals. Their low representation among cancer genes is surprising, given the overrepresentation of protein kinases, as two groups of proteins that are involved in signal transduction. However, the results indicate that normal metabolic connections of many GPCR do not substantially influence the process of cell proliferation, differentiation, and death that is the basis of neoplastic change.35
There are recurrent abnormalities in copy number in virtually all types of human tumors for which the target gene has already been identified, and there may be genes with variants in the germline sequence conferring additional cancer risk (cancer susceptibility genes). Conventional strategies such as positional cloning might overlook many mutated cancer genes, simply because they do not have the big picture of mutations in the entire genome. Instead, new strategies for global analysis have identified genes with copy number changes, and even genes with changes in expression. So there are still probably cancer genes to be identified, which could be identified now that we have the complete human genome.
So far approximately 1% of the human genome genes have been identified as cancer genes (Figure 5). Applying different criteria in identifying these genes, less than 300 candidates have been identified. This list may increase with the new strategies being developed such as pyrosequencing, SNAPShot sequencing, et cetera.
This work has been partially supported by CONACYT sectoral fund projects 69719 and 87244.
Conflict of interest statement: The authors have completed and submitted the form translated into Spanish for the declaration of potential conflicts of interest of the International Committee of Medical Journal Editors, and none were reported in relation to this article.