Wednesday, November 21, 2007
Virus Genomes M.sc Biotech
Virus Genomes
The Structure & Complexity of Virus Genomes
The composition & structure of virus genomes (i.e. the nucleic acid which encodes the genetic information of the virus) is more varied than any of those seen in the entire bacterial, plant or animal kingdoms. The nucleic acid comprising the genome may be single-stranded or double-stranded, & in a linear, circular or segmented configuration. Single-stranded virus genomes may be:
positive (+)sense, i.e. of the same polarity (nucleotide sequence) as mRNA
negative (-)sense
ambisense - a mixture of the two.
Virus genomes range in size from approximately 3,500 nucleotides (nt) (e.g. bacteriophages of the family Leviviridae such as MS2 & Qbeta) to approximately 280 kilobase pairs (kbp):
Unlike the genomes of all cells, which are composed of DNA, virus genomes may contain their genetic information encoded in either DNA or RNA.
Whatever the particular composition of a virus genome, all must conform to one condition. Since viruses are obligate intracellular parasites only able to replicate inside the appropriate host cells, the genome must contain information encoded in a form which can be recognised & decoded by the particular type of cell parasitized. Thus, the genetic code employed by the virus must match or at least be recognised by the host organism. Similarly, the control signals which direct the expression of virus genes must be appropriate to the host.
Many of the DNA viruses of eukaryotes closely resemble their host cells in terms of the biology of their genomes:
Some DNA virus genomes are complexed with cellular histones to form a chromatin-like structure inside the virus particle.
Vaccinia virus mRNAs were found to be polyadenylated at their 3' ends by Kates in 1970 - the first observation of this phenomenon.
Split genes containing non-coding introns, protein coding exons & spliced mRNAs were first discovered in adenoviruses by Sharp in 1977.
Molecular Genetics:
As already described, the new techniques of molecular biology have had a major influence in concentrating much attention on the virus genome. Initially, the questions to be asked about any virus genome will usually include the following:
Composition - DNA or RNA, single-stranded or double-stranded, linear or circular.
Size & number of segments.
Terminal structures.
Nucleotide sequence.
Coding capacity - open reading frames.
Regulatory signals - transcription enhancers, promoters & terminators.
Direct analysis by electron microscopy, if calibrated with known standards, can be used to estimate the size of nucleic acid molecules.The most important single technique has been gel electrophoresis. It is most common to use agarose gels to separate large nucleic acid molecules (several megabases or kilobases) & polyacrylamide gel electrophoresis (PAGE) to separate smaller pieces (a few hundred bp down to a few nucleotides).Nucleotide sequencing is dependent on the ability to separate molecules which differ from each other by only one nucleotide in size
The relative simplicity of virus genomes (compared with even the simplest cell) offers a major advantage - the ability to 'rescue' infectious virus from purified or cloned nucleic acids. Infection of cells caused by nucleic acid alone is referred to as transfection:
Virus genomes which consist of (+)sense RNA (i.e. the same polarity as mRNA) are infectious when the purified vRNA is applied to cells in the absence of any virus proteins. This is because (+)sense vRNA is essentially mRNA & the first event in a normally-infected cell is to translate the vRNA to make the virus proteins responsible for genome replication. In this case, direct introduction of RNA into cells merely circumvents the earliest stages of the replicative cycle.
In most cases, virus genomes which are composed of double-stranded DNA are also infectious. The events which occur here are a little more complex, since the virus genome must first be transcribed by host polymerases to produce mRNA. Using these techniques, virus can be rescued from cloned genomes, including those which have been manipulated in vitro.
RNA Virus Genomes
Positive-Strand RNA Viruses:
The ultimate size of single-stranded RNA genomes is limited by the fragility of RNA & the tendency of long strands to break. In addition, RNA genomes tend to have higher mutation rates than those composed of DNA because they are copied less accurately, which also tends to drive RNA viruses towards smaller genomes.
Single-stranded RNA genomes vary in size from those of Coronaviruses at approximately 30kb long to those of bacteriophages such as MS2 & Qb at about 3.5kb. Such genomes from different virus families share a number of common features:
Purified (+)sense vRNA is directly infectious when applied to susceptible host cells in the absence of any virus proteins (although it is about one million times less infectious than virus particles).
There is an untranslated region (UTR) at the 5' end of the genome which does not encode any proteins & a shorter UTR at the 3' end. These regions are functionally important in virus replication & are thus conserved in spite of the pressure to reduce genome size.
Both ends of (+)stranded eukaryotic virus genomes are often modified, the 5' end by a small, covalently attached protein or a methylated nucleotide 'cap' structure & the 3' end by polyadenylation. These signals allow vRNA to be recognised by host cells & to function as mRNA.
Negative-Strand RNA Viruses:
Viruses with negative-sense RNA genomes are a little more diverse than positive-stranded viruses. Possibly because of the difficulties of expression, they tend to have larger genomes encoding more genetic information. Because of this, segmentation is a common though not universal feature of such viruses.
Negative-sense RNA genomes are not infectious as purified RNA. This is because such virus particles all contain a virus-specific polymerase. The first event when the virus genome enters the cell is that the (-)sense genome is copied by the polymerase, forming either (+)sense transcripts which are used directly as mRNA, or a double-stranded molecule known either as the replicative intermediate (RI) or replicative form (RF), which serves as a template for further rounds of mRNA synthesis. Therefore, since purified negative-sense genomes cannot be directly translated & are not replicated in the absence of the virus polymerase, these genomes are inherently non-infectious.
Ambisense Genome Organization:
Some RNA viruses are not strictly 'negative-sense' but ambisense, since they are part (-)sense & part (+)sense:
DNA Virus Genomes
'Small' DNA Genomes:
Bacteriophages have been extensively studied as examples of DNA virus genomes. Although they vary considerably in size, in general terms they tend to be relatively small.
The structure of the bacteriophage M13 genome has been studied in great detail & modified extensively for use as a vector for DNA sequencing. The genome of this virus is:
circular
single-stranded DNA
approximately 7,200 nucleotides long
Unlike other virion structures, the filamentous M13 capsid can be lengthened by the addition of further protein subunits. The genome size of this virus can also be increased by the addition of extra sequences in the non-essential intergenic region without the penalty of becoming incapable of being packaged into the capsid. This is very unusual. In other viruses, the packaging constraints are much more rigid, e.g. in phage lambda, only DNA of between approximately 95% - 110% (approximately 46kbp - 54kbp) of the normal genome size (49kbp) can be packaged into the virus particle.
Not all bacteriophages have such simple genomes as M13, e.g. the genome of lambda is approximately 49kbp & that of phage T4 about 160kbp double-stranded DNA. These latter two bacteriophages also illustrate another common feature of linear virus genomes - the importance of the sequences present at the ends of the genome:
In the case of lambda, the substrate which is packaged into the phage heads during assembly consists of long concatemers of phage DNA which are produced during the later stages of vegetative replication. The DNA is 'reeled in' by the phage head & when a complete genome has been incorporated, cleaved at a specific sequence by a phage-encoded endonuclease. This enzymes leaves a 12bp 5' overhang on the end of each of the cleaved strands, known as the cos site. Hydrogen bond formation between these 'sticky ends' can result in the formation of a circular molecule. In a newly infected cell, the gaps on either side of the cos site are closed by DNA ligase & it is this circular DNA which is undergoes vegetative replication or integration into the bacterial chromosome.
Bacteriophage T4 illustrates another molecular feature of certain linear virus genomes, terminal redundancy. Replication of the T4 genome also produces long concatemers of DNA. These are cleaved by a specific endonuclease, but unlike the lamda genome, the lengths of DNA incorporated into the particle are somewhat longer than a complete genome length. Therefore, some genes are repeated at each end of the genome, & the DNA packaged into the phage particles contains reiterated information.
As further examples of small DNA genomes, consider those of two families of animal viruses, the parvoviruses & polyomaviruses:
Parvovirus genomes are:
linear
non-segmented
(+)sense
single-stranded DNA
about 5kb long
These are very small genomes, & even the replication-competent parvoviruses contain only two genes, rep, which encodes proteins involved in transcription & cap, which encodes the coat proteins. The ends of the genome have palindromic sequences of about 115nt, which form 'hairpins'. These structures are essential for the initiation of genome replication, once again emphasising the importance of the sequences at the ends of the genome.
The genomes of polyomaviruses consist of double-stranded, circular DNA molecules, approximately 5kbp in size:
The entire nucleotide sequence of all the viruses in the family is known & the architecture of the polyomavirus genome (i.e. number & arrangement of genes & function of the regulatory signals & systems) has been studied in great detail at a molecular level. Within the particles, the virus DNA is associated with four cellular histones. The genomic organization of these viruses has evolved to pack maximal information (6 genes) into minimal space (5kbp). This has been achieved by the use of both strands of the genome DNA & overlapping genes.
'Large' DNA Genomes
There are a number of virus groups which have double-stranded DNA genomes of considerable size & complexity. In many respects, these viruses are genetically very similar to the host cells which they infect. Two examples of such viruses are the adenovirus & herpesvirus families:
Herpesvirus genomes:
The herpesviruses are a large family containing more than 100 different members, at least one for most animal species which have been examined to date, including seven human herpesviruses.
Herpesviruses have very large genomes composed of up to 230kbp linear, double-stranded DNA. The different members of the family are widely separated in terms of genomic sequence & proteins, but all are similar in terms of structure & genome organization.
Some herpesvirus genomes consist of two covalently joined sections, a unique long (UL) & a unique short (US) region, each bounded by inverted repeats. The repeats allow structural rearrangements of the unique regions & therefore, these genomes exist as a mixture of four isomers, all of which are functionally equivalent:
Herpesvirus genomes also contain multiple repeated sequences & depending on the number of these, the genome size of various isolates of a particular virus can vary by up to 10kbp.
Adenovirus genomes:
The genomes of adenoviruses consist of linear, double-stranded DNA of 30-38kbp. These viruses contain 30-40 genes. The terminal sequences of each DNA strand are inverted repeats of 100-140bp & therefore, the denatured single strands can form 'panhandle' structures. These structures are important in DNA replication.
Although adenovirus genomes are considerably smaller than those of herpesviruses, the expression of the genetic information is rather more complex. Clusters of genes are expressed from a limited number of shared promoters. Multiply-spliced mRNAs & alternative splicing patterns are used to express a variety of polypeptides from each promoter. In contrast, herpesvirus genes each tend to be expressed from their own promoter - resulting in a much larger genome.
Segmented & Multipartite Virus Genomes
Segmented virus genomes are those which are divided into two or more physically separate molecules of nucleic acid, all of which are then packaged into a single virus particle.
Multipartite genomes are those which are segmented & where each genome segment is packaged into a separate virus particle. These discrete particles are structurally similar & may contain the same component proteins, but often differ in size depending on the length of the genome segment packaged.
There are many examples of segmented virus genomes, including many human, animal & plant pathogens such as orthomyxoviruses, reoviruses & bunyaviruses. There are rather fewer examples of multipartite viruses, all of which infect plants. These include:
bipartite viruses (which have two genome segments/virus particles)
tripartite viruses (three genome segments/virus particles)
Separating the genome segments into different particles removes the requirement for accurate sorting, but introduces a new problem in that all of the discrete virus particles must be taken up by a single host cell to establish a productive infection. This is perhaps the reason why multipartite viruses are only found in plants. Many of the sources of infection by plant viruses, such as inoculation by sap-sucking insects or after physical damage to tissues, results in a large input of infectious virus particles, providing the opportunity for infection of an initial cell by more than one particle.
SUMMARY:
There is more genetic diversity among viruses than in all the rest of the Animal, Plant & Bacterial kingdoms, all of whose genomes consist of d/s DNA.
The expression of virus genetic information is dependent on the structure of the genome of the particular virus concerned, but in every case, the genome must be recognized & expressed using the mechanisms of the host cell.
The Structure & Complexity of Virus Genomes
The composition & structure of virus genomes (i.e. the nucleic acid which encodes the genetic information of the virus) is more varied than any of those seen in the entire bacterial, plant or animal kingdoms. The nucleic acid comprising the genome may be single-stranded or double-stranded, & in a linear, circular or segmented configuration. Single-stranded virus genomes may be:
positive (+)sense, i.e. of the same polarity (nucleotide sequence) as mRNA
negative (-)sense
ambisense - a mixture of the two.
Virus genomes range in size from approximately 3,500 nucleotides (nt) (e.g. bacteriophages of the family Leviviridae such as MS2 & Qbeta) to approximately 280 kilobase pairs (kbp):
Unlike the genomes of all cells, which are composed of DNA, virus genomes may contain their genetic information encoded in either DNA or RNA.
Whatever the particular composition of a virus genome, all must conform to one condition. Since viruses are obligate intracellular parasites only able to replicate inside the appropriate host cells, the genome must contain information encoded in a form which can be recognised & decoded by the particular type of cell parasitized. Thus, the genetic code employed by the virus must match or at least be recognised by the host organism. Similarly, the control signals which direct the expression of virus genes must be appropriate to the host.
Many of the DNA viruses of eukaryotes closely resemble their host cells in terms of the biology of their genomes:
Some DNA virus genomes are complexed with cellular histones to form a chromatin-like structure inside the virus particle.
Vaccinia virus mRNAs were found to be polyadenylated at their 3' ends by Kates in 1970 - the first observation of this phenomenon.
Split genes containing non-coding introns, protein coding exons & spliced mRNAs were first discovered in adenoviruses by Sharp in 1977.
Molecular Genetics:
As already described, the new techniques of molecular biology have had a major influence in concentrating much attention on the virus genome. Initially, the questions to be asked about any virus genome will usually include the following:
Composition - DNA or RNA, single-stranded or double-stranded, linear or circular.
Size & number of segments.
Terminal structures.
Nucleotide sequence.
Coding capacity - open reading frames.
Regulatory signals - transcription enhancers, promoters & terminators.
Direct analysis by electron microscopy, if calibrated with known standards, can be used to estimate the size of nucleic acid molecules.The most important single technique has been gel electrophoresis. It is most common to use agarose gels to separate large nucleic acid molecules (several megabases or kilobases) & polyacrylamide gel electrophoresis (PAGE) to separate smaller pieces (a few hundred bp down to a few nucleotides).Nucleotide sequencing is dependent on the ability to separate molecules which differ from each other by only one nucleotide in size
The relative simplicity of virus genomes (compared with even the simplest cell) offers a major advantage - the ability to 'rescue' infectious virus from purified or cloned nucleic acids. Infection of cells caused by nucleic acid alone is referred to as transfection:
Virus genomes which consist of (+)sense RNA (i.e. the same polarity as mRNA) are infectious when the purified vRNA is applied to cells in the absence of any virus proteins. This is because (+)sense vRNA is essentially mRNA & the first event in a normally-infected cell is to translate the vRNA to make the virus proteins responsible for genome replication. In this case, direct introduction of RNA into cells merely circumvents the earliest stages of the replicative cycle.
In most cases, virus genomes which are composed of double-stranded DNA are also infectious. The events which occur here are a little more complex, since the virus genome must first be transcribed by host polymerases to produce mRNA. Using these techniques, virus can be rescued from cloned genomes, including those which have been manipulated in vitro.
RNA Virus Genomes
Positive-Strand RNA Viruses:
The ultimate size of single-stranded RNA genomes is limited by the fragility of RNA & the tendency of long strands to break. In addition, RNA genomes tend to have higher mutation rates than those composed of DNA because they are copied less accurately, which also tends to drive RNA viruses towards smaller genomes.
Single-stranded RNA genomes vary in size from those of Coronaviruses at approximately 30kb long to those of bacteriophages such as MS2 & Qb at about 3.5kb. Such genomes from different virus families share a number of common features:
Purified (+)sense vRNA is directly infectious when applied to susceptible host cells in the absence of any virus proteins (although it is about one million times less infectious than virus particles).
There is an untranslated region (UTR) at the 5' end of the genome which does not encode any proteins & a shorter UTR at the 3' end. These regions are functionally important in virus replication & are thus conserved in spite of the pressure to reduce genome size.
Both ends of (+)stranded eukaryotic virus genomes are often modified, the 5' end by a small, covalently attached protein or a methylated nucleotide 'cap' structure & the 3' end by polyadenylation. These signals allow vRNA to be recognised by host cells & to function as mRNA.
Negative-Strand RNA Viruses:
Viruses with negative-sense RNA genomes are a little more diverse than positive-stranded viruses. Possibly because of the difficulties of expression, they tend to have larger genomes encoding more genetic information. Because of this, segmentation is a common though not universal feature of such viruses.
Negative-sense RNA genomes are not infectious as purified RNA. This is because such virus particles all contain a virus-specific polymerase. The first event when the virus genome enters the cell is that the (-)sense genome is copied by the polymerase, forming either (+)sense transcripts which are used directly as mRNA, or a double-stranded molecule known either as the replicative intermediate (RI) or replicative form (RF), which serves as a template for further rounds of mRNA synthesis. Therefore, since purified negative-sense genomes cannot be directly translated & are not replicated in the absence of the virus polymerase, these genomes are inherently non-infectious.
Ambisense Genome Organization:
Some RNA viruses are not strictly 'negative-sense' but ambisense, since they are part (-)sense & part (+)sense:
DNA Virus Genomes
'Small' DNA Genomes:
Bacteriophages have been extensively studied as examples of DNA virus genomes. Although they vary considerably in size, in general terms they tend to be relatively small.
The structure of the bacteriophage M13 genome has been studied in great detail & modified extensively for use as a vector for DNA sequencing. The genome of this virus is:
circular
single-stranded DNA
approximately 7,200 nucleotides long
Unlike other virion structures, the filamentous M13 capsid can be lengthened by the addition of further protein subunits. The genome size of this virus can also be increased by the addition of extra sequences in the non-essential intergenic region without the penalty of becoming incapable of being packaged into the capsid. This is very unusual. In other viruses, the packaging constraints are much more rigid, e.g. in phage lambda, only DNA of between approximately 95% - 110% (approximately 46kbp - 54kbp) of the normal genome size (49kbp) can be packaged into the virus particle.
Not all bacteriophages have such simple genomes as M13, e.g. the genome of lambda is approximately 49kbp & that of phage T4 about 160kbp double-stranded DNA. These latter two bacteriophages also illustrate another common feature of linear virus genomes - the importance of the sequences present at the ends of the genome:
In the case of lambda, the substrate which is packaged into the phage heads during assembly consists of long concatemers of phage DNA which are produced during the later stages of vegetative replication. The DNA is 'reeled in' by the phage head & when a complete genome has been incorporated, cleaved at a specific sequence by a phage-encoded endonuclease. This enzymes leaves a 12bp 5' overhang on the end of each of the cleaved strands, known as the cos site. Hydrogen bond formation between these 'sticky ends' can result in the formation of a circular molecule. In a newly infected cell, the gaps on either side of the cos site are closed by DNA ligase & it is this circular DNA which is undergoes vegetative replication or integration into the bacterial chromosome.
Bacteriophage T4 illustrates another molecular feature of certain linear virus genomes, terminal redundancy. Replication of the T4 genome also produces long concatemers of DNA. These are cleaved by a specific endonuclease, but unlike the lamda genome, the lengths of DNA incorporated into the particle are somewhat longer than a complete genome length. Therefore, some genes are repeated at each end of the genome, & the DNA packaged into the phage particles contains reiterated information.
As further examples of small DNA genomes, consider those of two families of animal viruses, the parvoviruses & polyomaviruses:
Parvovirus genomes are:
linear
non-segmented
(+)sense
single-stranded DNA
about 5kb long
These are very small genomes, & even the replication-competent parvoviruses contain only two genes, rep, which encodes proteins involved in transcription & cap, which encodes the coat proteins. The ends of the genome have palindromic sequences of about 115nt, which form 'hairpins'. These structures are essential for the initiation of genome replication, once again emphasising the importance of the sequences at the ends of the genome.
The genomes of polyomaviruses consist of double-stranded, circular DNA molecules, approximately 5kbp in size:
The entire nucleotide sequence of all the viruses in the family is known & the architecture of the polyomavirus genome (i.e. number & arrangement of genes & function of the regulatory signals & systems) has been studied in great detail at a molecular level. Within the particles, the virus DNA is associated with four cellular histones. The genomic organization of these viruses has evolved to pack maximal information (6 genes) into minimal space (5kbp). This has been achieved by the use of both strands of the genome DNA & overlapping genes.
'Large' DNA Genomes
There are a number of virus groups which have double-stranded DNA genomes of considerable size & complexity. In many respects, these viruses are genetically very similar to the host cells which they infect. Two examples of such viruses are the adenovirus & herpesvirus families:
Herpesvirus genomes:
The herpesviruses are a large family containing more than 100 different members, at least one for most animal species which have been examined to date, including seven human herpesviruses.
Herpesviruses have very large genomes composed of up to 230kbp linear, double-stranded DNA. The different members of the family are widely separated in terms of genomic sequence & proteins, but all are similar in terms of structure & genome organization.
Some herpesvirus genomes consist of two covalently joined sections, a unique long (UL) & a unique short (US) region, each bounded by inverted repeats. The repeats allow structural rearrangements of the unique regions & therefore, these genomes exist as a mixture of four isomers, all of which are functionally equivalent:
Herpesvirus genomes also contain multiple repeated sequences & depending on the number of these, the genome size of various isolates of a particular virus can vary by up to 10kbp.
Adenovirus genomes:
The genomes of adenoviruses consist of linear, double-stranded DNA of 30-38kbp. These viruses contain 30-40 genes. The terminal sequences of each DNA strand are inverted repeats of 100-140bp & therefore, the denatured single strands can form 'panhandle' structures. These structures are important in DNA replication.
Although adenovirus genomes are considerably smaller than those of herpesviruses, the expression of the genetic information is rather more complex. Clusters of genes are expressed from a limited number of shared promoters. Multiply-spliced mRNAs & alternative splicing patterns are used to express a variety of polypeptides from each promoter. In contrast, herpesvirus genes each tend to be expressed from their own promoter - resulting in a much larger genome.
Segmented & Multipartite Virus Genomes
Segmented virus genomes are those which are divided into two or more physically separate molecules of nucleic acid, all of which are then packaged into a single virus particle.
Multipartite genomes are those which are segmented & where each genome segment is packaged into a separate virus particle. These discrete particles are structurally similar & may contain the same component proteins, but often differ in size depending on the length of the genome segment packaged.
There are many examples of segmented virus genomes, including many human, animal & plant pathogens such as orthomyxoviruses, reoviruses & bunyaviruses. There are rather fewer examples of multipartite viruses, all of which infect plants. These include:
bipartite viruses (which have two genome segments/virus particles)
tripartite viruses (three genome segments/virus particles)
Separating the genome segments into different particles removes the requirement for accurate sorting, but introduces a new problem in that all of the discrete virus particles must be taken up by a single host cell to establish a productive infection. This is perhaps the reason why multipartite viruses are only found in plants. Many of the sources of infection by plant viruses, such as inoculation by sap-sucking insects or after physical damage to tissues, results in a large input of infectious virus particles, providing the opportunity for infection of an initial cell by more than one particle.
SUMMARY:
There is more genetic diversity among viruses than in all the rest of the Animal, Plant & Bacterial kingdoms, all of whose genomes consist of d/s DNA.
The expression of virus genetic information is dependent on the structure of the genome of the particular virus concerned, but in every case, the genome must be recognized & expressed using the mechanisms of the host cell.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment