How to cite this article: Santos-López G, Márquez-Domínguez L, Reyes-Leyva J, Vallejo-Ruiz V. General aspects of structure, classification and replication of human papillomavirus. Rev Med Inst Mex Seguro Soc. 2015;53 Supl 2:S166-71.
CURRENT THEMES
Received: October 22nd 2014
Accepted: May 15th 2015
Gerardo Santos-López,a Luis Márquez-Domínguez,a Julio Reyes-Leyva,a Verónica Vallejo-Ruiza
aLaboratorio de Biología Molecular y Virología, Centro de Investigación Biomédica de Oriente, Instituto Mexicano del Seguro Social, Metepec, Atlixco, Puebla, México
Communication with: Gerardo Santos-López
Telephone: (244) 444 0122
Email: gerardo.santos.lopez@gmail.com
Human papillomavirus (HPV) refers to a group of viruses which belongs to a larger group, commonly referred to as papillomaviruses. These viruses are taxonomically located in the Papillomaviridae family. Papillomaviruses are small, non-enveloped with a genome of double-stranded DNA and they have affinity for epithelial tissue. Many of them are associated with human infection; they induce benign lesions of the skin (warts) and mucous membranes (condylomas), but they are also associated with some epithelial malignancies, such as cervical cancer and other tumors of the urogenital tract. Papillomaviridae contains 16 genera, which are named with a Greek letter prefix and the termination papillomavirus, e.g., Alphapapillomavirus, Betapapillomavirus, etcetera. From the clinical point of view, human papillomaviruses infecting the genital tract (which are located in the genus Alphapapilomavirus) have been divided into two groups: those of low risk, associated with benign genital warts, and those of high risk, with oncogenic potential, which are the etiological agents of cervical cancer. In this paper we review some relevant aspects of the structure, replication cycle and classification of human papillomaviruses.
Keywords: Human papillomavirus, High risk HPV, HPV classification.
Papillomaviruses comprise a group of small, non-enveloped viruses with a genome of double-stranded DNA, which have an affinity for epithelial tissue. Many of them are associated with human infection; they produce lesions on the skin (warts) and mucosa (genital warts), but are also associated with some malignant processes in the epithelium, especially cervical cancer and other tumors of the anogenital tract as well as the head and neck.1-3
The association between human warts and viruses has been documented since the early twentieth century, when the Italian G. Ciuffo reported that using a cell-free filtrate it was possible to reproduce the presentation of warts. Later, in 1935, R. Shope and E.W. Hurst showed that papillomaviruses can cause skin carcinoma in a rabbit model, and a few years later the presence of the virus could be visually proven by electron microscopy with the work of M.J. Strauss in 1949.1,4,5
It was not until the early 1970s that H. zur Hausen proposed that a virus could be the causative agent of cervical cancer in humans. In the 1980s his group demonstrated with Southern blot analysis the presence of DNA from two types of papilloma virus in cervical cancer biopsies, new at that time: virus types 16 and 18.4-6 In the time since then, many research papers have defined the importance of HPV in this cancer and have shown it to be the main risk factor for developing cervical cancer, and that there are varieties of these viruses that are directly related to these processes of transformation and others that are not, which provides tools for diagnosis and treatment of this important public health problem.
HPV particles are icosahedral, have no envelope, and are between 52 and 55 nm in diameter. The capsid is composed of 72 pentameric capsomeres of the most abundant protein (L1) in an array with triangulation number (T) of 7 (Figure 1). Another capsid protein known as L2 is internally associated with a subset of capsomeres formed by L1. The virions are resistant to treatment with ether, acids, and heat (50° for one hour). Lipid or glycose components have been not found in the virions.1,7,8 Within the capsid is the viral genome, which is composed of covalently closed, circular, double-stranded DNA.1,9
Figure 1 Structure of papillomaviruses. A) 3D image of a virion that has been reconstructed by computer from images obtained by cryo-electron microscopy and from crystallized L1 protein. Obtained from VIPER db: http://viperdb.scripps.edu/info_page.php?VDB=1l0t, accessed May 24th 2013. B) Diagram of HPV capsid where the main capsid protein L1 is seen, as well as the viral genome packaging with cellular histones. Obtained from Viral Zone: http://viralzone.expasy.org/.
Viral genome and proteins
The papillomavirus genome is between 6800 and 8400 base pairs (bp) and is associated with host proteins, H2A, H2B, H3 and H4 histones, in a structure of the host’s type of chromatin.8,10
The genome has been divided into three main regions: a non-coding regulatory region approximately of 1 kb, which is called the long control region (LCR); a region including early expression genes, giving rise to non-structural proteins; and a region containing late expression genes, giving rise to two structural proteins. In total there are 9 or 10 open reading frames and in all papillomaviruses they are located on a single strand of genomic DNA (Figure 2).1,11
Figure 2 Diagram of the human papillomavirus type 16 genome. It is a circular double-stranded genome of 7904 base pairs. PE and PL promoters are shown for the expression of early and late genes, respectively, as well as PAE and PAL elements, which are polyadenylation signals for early and late genes, respectively. At the top an amplification of the long control region (LCR) is seen, with different response elements to viral and cellular transcription factors. Figure taken and modified from Doorbar et al.12
The LCR contains response elements for cellular transcription factors such as AP1, SP1, OCT1, etc., as well as viral proteins E1 and E2, which control replication and expression of the viral genome. In particular, it has been determined that HPV 16 has elements known as PE (or p97) and PL (or p670), which are promoters that regulate the expression of early and late genes, respectively, as well as the presence of mRNA with modifications of cutting and splicing during epithelial cell differentiation.12
Reading frames are grouped into two sets called early expression genes (E) and late expression genes (L). In the first group are E1, E2, E4, E5, E6, and E7, while in the second are L1 and L2.11,12 Two additional reading frames can be identified in some papillomavirus, which are designated E3 and E8.13
There are only two proteins encoded in the genome as part of the virion structure: L1 and L2. Other viral proteins have different functions during the replication cycle. Table I summarizes the general functions of the HPV proteins.
Table I Proteins of the human papillomavirus and associated functions10,13,18 | ||
Type of protein | Name | Associated functions or activities |
Non-structural | E1 | Has functions of helicase. Essential for replication and transcription |
E2 | Essential for viral replication and transcription, genomic segregation, and encapsidation | |
E4 | Regulates late gene expression, controls viral maturation and output of virions | |
E5 | Stimulates the transforming activity of E6 and E7, promotes cell fusion generating aneuploidy and chromosomal instability, contributes to the invasion of immune response | |
E6 | Binds and induces the degradation of tumor suppressor protein p53, inhibiting apoptosis; interacts with proteins of the innate immune system; contributes to avoidance of immune response and viral persistence; activates telomerase expression | |
E7 | Binds and induces degradation of tumor suppressor protein pRB; increases activity of cyclin-dependent kinases; affects the expression of genes of phase S by direct interaction with E2F transcription factors and histone deacetylase; contributes to avoidance of immune response | |
Structural | L1 | Primary capsid protein. Recognizes receptors on the host cell. Highly immunogenic and induces neutralizing antibodies |
L2 | Secondary capsid protein. Participates in the union of virion to cell, entrance to cell, and transport to the nucleus, genome release, and virion assembly | |
The denomination E or L refers to early or late, in accordance with its synthesis or functions during the replicative cycle. Some papillomavirus have reading frames for E8 and E3 proteins, but their functions are still unknown |
Papillomaviruses have a high specificity for squamous epithelial cells. These cells are where it carries out the synthesis of new viral particles. The replicative cycle of papillomaviruses is commonly divided into two stages, called early and late. These stages are linked to the state of differentiation of epithelial cells present in the tissue.1,11,12
The establishment of the virus in tissue requires infection of the basal keratinocytes, often through lesions or abrasions in the tissue, suggesting that the presence of cells with mitotic activity is necessary.12 The introduction of virions into the cell is initiated by the interaction of L1 protein with heparan sulfate and syndecan 3 on the cellular surface.14,15 Integrin alpha-6 has also been implicated in virus entry into the cell, whose interaction induces signals that lead to inhibiting apoptosis through Ras/MAP and PI3K/Akt.16,17
Most papillomaviruses appear to enter the cell by clathrin-dependent receptor-mediated endocytosis. The stripping of the virion and the output of the viral genome occur in the endosome. Subsequently, the L2 protein and the genome migrate to the nucleus. There are several studies that show the importance of L2 protein this process.1,11,12
Once inside the nucleus, the genome is transcribed in a series of complex processes involving the presence of multiple promoters, different mRNA modification patterns (splicing), and a differentiated production of these between different cells.1
E1 and E2 are the first proteins to be expressed, which generates a control in the number of copies of the episomal viral genome (not integrated into the cell genome). These proteins are maintained at 20 to 100 copies per cell. Both form a complex to recruit cellular polymerization machinery and accessory factors for genome replication.1,11 In the suprabasal layer, expression of E1, E2, E5, E6, and E7 genes contributes to the maintenance of the viral genome and induces cell proliferation, increasing the number of cells able to be infected, which results in increased viral production. In the areas with the most differentiated cells, the expression of E1, E2, E6 and E7 genes is maintained, and the E4 gene also begins to be expressed, which has the function of amplifying the viral genome replication, significantly increasing the genome copy number, while transcription of the late genes L1 and L2 is activated, which are involved in the assembly and output of new virions.11,12,18
The late functions of papillomaviruses, such as viral DNA synthesis, capsid protein synthesis, and assembly of virions, occur exclusively in differentiated keratinocytes. The transcriptional regulation of the late genes is directed by a specific promoter that only responds in differentiated keratinocytes. Little is known of the assembly and release process of viral particles; however, genome encapsidation is assisted by L2 and facilitated by E2. The particles have been seen in the granular layer of the epithelium, but not in lower layers. It is assumed that the virus is non-cytolytic and that the release of viral particles does not occur before the cornified layer of keratinized epithelium; however, the mechanisms are still unknown.1,11,12,18
E6 and E7 proteins have been widely studied and are known to be important in cellular transformation. The E6 protein of high-risk viruses, for example, is able to bind and induce degradation of the tumor suppressor protein p53, which makes the infected cell not enter apoptosis and continue to house the virus. Something similar happens with the E7 protein, which binds and induces the degradation of the tumor suppressor protein pRB with similar consequences. The E6 protein and its effect on p53 in discrimination of the high-risk viruses has been selectively associated with those of low-risk, since the former have a very active E6 against p53, while E6 from low-risk viruses has a lower affinity for p53 and has almost no effect on it.2,19,20
Another important feature usually associated with the high-risk virus is that the viral genome is integrated into the genome of the cell, while in the low-risk virus the genome remains episomal. This integration process has been associated with the move from a high-grade lesion to invasive cancer. It is reported that over half of cancer cases with HPV 16 infection and most with HPV 18 infection have the viral genome integrated. The sites where viral genome integration occurs have not been specifically identified; however, it has been found that it occurs in fragile sites in regions of genomic instability. In the early stages of infection, E6 and E7 genes are repressed by the E2 protein; however, when the viral genome is integrated into the cellular genome, the E2 gene is disrupted and its protein synthesis is lost, so E6 and E7 proteins are increased, whereby the cell may be transformed and immortalized, and cancer can consequently appear.2,19-21
Classification of papillomaviruses has been somewhat complicated by several factors. Unlike other viruses, papillomaviruses do not generate consistent humoral immune response, either in humans or other mammals, so it has not been possible to develop a system of classification by serotype, added to the lack of cell infection models or laboratory animal models.8,22
For initial classification of papillomaviruses, two basic criteria are taken: a) the host, as these viruses are highly species-specific, and b) the genetic sequences, which allow detailed distinction between different isolates.8 The sequence most used for papillomavirus classification is that of the L1 gene, which is highly conserved, although other genes such as E6 and E7 have also been used. A new type of papillomavirus is established when L1 gene sequences vary by more than 10% compared to known HPV types. If the difference is 2-10%, they are classified as viral subtypes, and if the difference is less than 2% they are defined as viral variants.23,24
An important fact that can be confusing is that since the discovery of the first papillomavirus, the word type and a number have been used to denote different viruses discovered, which might suggest that a type is equivalent to a species of papilomavirus.6,8 According to the current classification in a species, viruses that have been named as different types can be included; for example, the species of the human papilloma virus 16 includes virus types 16, 31, 33, 35, 52, 58, and 67. This fact is available on the online version of the classification of papillomaviruses from the International Committee on Taxonomy of Viruses (http://ictvdb.bio-mirror.cn/Ictv/fs_papil.htm, accessed May 24th 2013).
In short, papillomaviruses are a family called Papillomaviridae, which as of 2013 contains a total of 170 members known as human papillomaviruses.24 The members are grouped into 16 genera, which are named with a Greek letter as prefix and the ending papillomavirus. For example: Alphapapillomavirus, Betapapillomavirus, etcetera. Within each genus are species, for example: in the genus Alphapapillomavirus there are 15 species, including the human papillomavirus 16, which, as mentioned, have genetic varieties that can be named with different numbers.6,8
From a clinical point of view, the human papillomaviruses that infect the mucosa of the genital tract (which are located in genus Alphapapillomavirus) have been divided into two groups: low-risk, mainly associated with benign genital warts, and high-risk, which have a high oncogenic potential and are the causative agents of cervical cancer.25
In a multinational work (from nine countries) in 2003, samples were analyzed from 1918 women with cervical cancer, with which the epidemiological classification of high- or low-risk HPV was proposed, according to the presence of certain types in the samples analyzed.26 Basically 15 HPV types were proposed as high-risk (16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 68, 73, and 82), three types as likely high-risk (26, 53, and 66), and 12 as low-risk (6, 11, 40, 42, 43, 44, 54, 61, 70, 72, 81, and CP6108).
In a 2012 analysis it was reported that HPV type 16 is the cause of 54.4% of cases of cervical cancer globally, followed by type 18 with 16.5%, while types 52, 31, 45, 33, and 58 cause of 3 to 5% of cases.27
Type 16 and type 18 viruses are the most studied from different points of view; therefore, there is more detailed information on their genetic variation, which has led to their sub-classification into variants. Thus, virus type 16 has European (E), Asian (As), Asian-American (AA), North American (NA), African-1 (Af1), and African-2 (af2) variants,28 whereas type 18 has European (E), African (Af) and Asian-Amerindian (AAI) variants.29
In order to facilitate the classification of medically important papillomaviruses, another way of naming them has been created. All papillomaviruses associated with cancer are located in the genus Alphapapillomavirus. Plotting a phylogenetic tree, constituent viruses can be grouped into branches that, instead of being named as a species, which was already mentioned in this text, can be numbered progressively from 1 to 14, forming groups that have been named such as alpha 1 or A1, to alpha 14 or A14.24,30 So, for example, in the alpha 9 group there are virus types 16, 31, 33, 35, 52, 58, and 67, that is, they are located in the species human papillomavirus virus type 16, as mentioned previously.
Although papillomaviruses have been widely studied for their relevance to human health, there is still much research to be done on aspects of the structure, replication cycle, and pathogenesis in order to confront the problems they cause. However, it is also necessary to conduct epidemiological studies in different populations to determine whether there are more viral types associated with cancer and also to define the class of vaccines and the viral types they should contain in the future, and to establish better ways of identifying and classifying the genetics of the virus and its potential to cause cancer.
Conflict of interest statement: The authors have completed and submitted the form translated into Spanish for the declaration of potential conflicts of interest of the International Committee of Medical Journal Editors, and none were reported in relation to this article.