BioWiki // Genetics

Protein Synthesis from DNA



From DNA code to Protein code

Protein Synthesis from DNA

Proteins are large biological molecules that are made of amino acids (monomers) that are joined together by peptide bonds. It is a polymer of amino acids.
To form proteins, many amino acids come together by dehydration to form a peptide bond, which is a bond between the carboxyl group of amino acid and the amino group nitrogen of another amino acid. Peptide bonds are covalent in nature (significantly strong) and help proteins keep their complex structures conformationally stable to perform their functions.

Amino acids constitute the building blocks of proteins and also are found as intermediates in metabolism. The precise amino acid content, and the sequence of those amino acids for a particular protein, is determined by the sequence of the bases in the gene that encodes that protein. The chemical properties and interactions between the amino acids of proteins determine its final 3D shape and hence, its  biological activity . Proteins  catalyze  most of the chemical reactions in living cells without which life itself would not be possible.

Amino acids are organic compounds composed of an amine (-NH2) and carboxylic acid (-COOH) functional groups, along with a side chain specific ( R ) to each amino acid.

Amino Acid

Dehydration reaction


There are 20 essential and semi essential (as they can be synthesized by the essential precursor  in chemical reactions inside the eukaryotic cell) amino acids:

Protein Structures


The primary structure of a protein is its amino acid sequence. Electrical interactions drive the folding and intramolecular bonding of the linear amino acid chain, which ultimately determines the protein’s unique three-dimensional shape. Hydrogen bonding between amino groups and carboxyl groups in neighboring regions of the protein chain sometimes causes certain patterns of folding to occur.

Most common stable folding patterns are alpha helices and beta sheets.These  make up the secondary structure of a protein. Most proteins contain multiple helices and sheets, in addition to other less common patterns . A polypeptide folds and contorts in a 3D shape following attraction and repulsion forces between its own neighboring constituent functional groups, constituting the tertiary structure. Finally, the quaternary structure of a protein refers to those macromolecules with multiple polypeptide chains or subunits.




Introduction: DNA is a nucleic acid.

Nucleic acids are the biopolymers where all the necessary information is stored and contain the instructions needed for an organism to develop, survive and reproduce.

These biopolymers are assembled of monomers called nucleotides. Each nucleotide is composed of:

  •  a Phosphate group
  •  a five Carbon-sugar
  •  a Nitrogenous base, also called a nucleoside .

It is the specific order and sequence in which the nucleotides are ordered, with no limitation in size of the final  transcript, what codes for the immense diversity of proteins synthesized with the most varied functionality.
The DNA sequences are converted into messages that are used to synthesize proteins, which are the complex macro-molecules that do most of the work in our bodies; they are required for the structure, function, and regulation of the body’s tissues and organs.

DNA is a double-stranded molecule held together by weak hydrogen bonds between base pairs of nucleotides . The molecule forms a double helix in which two strands of DNA spiral about one other. The double helix looks like a considerably long ladder twisted into a helix, or coil which sides are formed by a backbone of sugar and phosphate molecules, and the crosspiece consist of nucleotide bases joined weakly in the middle by the hydrogen bonds.

  • The relationship between structure and function is manifest in the double helix .
  • Since the two strands of DNA are complementary each strand acts as a template for building a new strand in replication, or for copying from one of the strands, particular and limited sequences of DNA bases that code for specific protein synthesis.


DNA structure:

  1. DNA is a double stranded helix with antiparallel strands composed of four nucleotides, A binds with T through two hydrogen bonds, C with G through three hydrogen bonds
  2. These nucleotides are linked by 5’C-3’C phosphodiester bonds through a deoxyribose sugar molecule.

                                                                                                   Phosphate group           +       5 C sugar            +           Nucleoside        =        Nucleotide           =      DNA monomer


The information content of DNA is in the form of specific sequences of nucleotides along the DNA strands that when inherited by an organism, leads to specific traits by dictating the synthesis of proteins
The process by which DNA directs protein synthesis, gene expression includes two stages, called transcription and translation.


              DNA —–> RNA —–> Protein


  • Transcription
    • Is the synthesis of RNA under the direction of DNA
    • Produces messenger RNA (mRNA)


  • Translation
    • Is the actual synthesis of a polypeptide, which occurs under the direction of mRNA
    • Occurs on ribosomes.
    • In a eukaryotic cell the nuclear envelope separates transcription from translation
    • Extensive RNA processing occurs in the nucleus.


Transcription is the DNA-directed synthesis of RNA

Messenger RNA  synthesis, mRNA, is catalyzed by RNA polymerase , which pries the DNA strands apart and hooks together the RNA nucleotides; it follows the same base-pairing rules as DNA, except that in RNA, uracil substitutes for thymine .

RNA, in this case messenger RNA, or mRNA, is single stranded, not double stranded like DNA, it is short (only 1 gene long) where DNA is very long and contains many genes. RNA uses the sugar ribose instead of the deoxyribose used by the DNA molecule.

The stages of transcription are


Promoters signal the initiation of RNA synthesis.
Transcription factors help eukaryotic RNA polymerase recognize promoter sequences.

Transcription Factors:

Transcription factors are a fairly large number of proteins involved in the process of transcribing DNA into messenger RNA. They are resposible for initiation and regulation of mRNA transcription output (volume of mRNA molecules produced), and through interaction with other molecules, have the ability to turn the process on or off. Regulation of transcription is the most common form of gene expression control, and often is subject to manipulation through pharmacological intervention.

RNA polymerase synthesizes a single strand of RNA against the DNA template strand (anti-sense strand), adding nucleotides to the 3’ end of the RNA chain
As RNA polymerase moves along the DNA it continues to untwist the double helix, exposing about 10 to 20 DNA bases at a time for pairing with RNA nucleotides.


Specific sequences in the DNA signal Termination of transcription.

When one of these is encountered by the polymerase, the RNA transcript is released from the DNA and the double helix can zip up again.

PostTermination Processing: Most eukaryotic mRNAs aren’t ready to be translated into protein directly after being transcribed from DNA. mRNA requires processing, that occur in the nucleus. After this, the messenger RNA moves to the cytoplasm for translation.
The cell adds a protective cap to one end, and a tail of A’s to the other end. These both function to protect the RNA from enzymes that would degrade it.

Most of the genome consists of non-coding regions called introns. Non-coding regions may have specific chromosomal functions or have regulatory purposes. Introns also allow for alternative RNA splicing. Thus, an RNA copy of a gene is converted into messenger RNA by doing 2 things: add protective bases to the ends and cut out the introns

Translation is the messenger RNA-directed synthesis of a polypeptide that occurs at ribosomes. Since there are two languages involved, one being the DNA base sequences grouped in triplets or codons, and the other being the aminoacid sequence that conforms the protein to synthesize, then an appropiate translation from the the first to the second must occur.

Translation: From DNA Code to Protein Code and Assembly

RNA codons code for specific amino acids. The order of the bases in the codon sequence also matters for the amino acid specified. Any of the four nucleotides in RNA may occupy one of three possible codon positions. There are 64 possible codon combinations. Sixty-one codons specify amino acids and three, namely UAA, UAG, UGA function as stop signals and the signal the end of protein synthesis. The codon AUG codes for methionine which signals the start signal for the beginning of translation. Multiple codons may also specify the same amino acid.

In short:         Universal Genetic code


  • Codons: 3 base code for the production of a specific amino acid, sequence of three of the four different nucleotides
  • Since there are 4 bases and 3 positions in each codon, there are 4^3= 64 possible codons
  • 64 codons but only 20 amino acids
  • 3 of the 64 codons are used as STOP signals; they are found at the end of every gene and mark the end of the protein
  • One codon is used as a START signal: it is at the start of every protein


Thus, a codon in messenger RNA is either translated into an amino acid or serves as a translational start/stop signal .



At this point, for translation to occur, there are needed:

1. The mRNA
2. The assembly of a ribosome, both large and small sub units
3. tRNA



A ribosome is essentially an assembly facility for producing proteins. Each complete ribosome is constructed from two sub-units:
A large sub-unit which performs the catalytic function enabling the peptide bonding between joining consecutive aminoacids, ribozyme.
The smaller sub-unit mainly where the decoding of codon/tRNAanticodon occur. It links up with mRNA and then locks-on to the larger sub-unit, completing the assembly for the necessary synthetic machinery .


Transfer RNA,  tRNA:


Transfer RNA is a small RNA molecule that participates in protein synthesis. Each tRNA has two areas: the anticodon (set of three nucleotide base that pair with the triplet or codon of mRNA and a regional structure designed for bonding the particular amino acid.
To ensure the appropiate amino acid is bound to the growing chain, tRNA anticodon binds by base pairing to its complementary triplet, or codon, in mRNA.


The ribosome matches the base sequence on the mRNA in sets of three bases, called codons, to tRNA molecules that have the three complementary bases in their anticodon regions. Again, the base-pairing rule is important in this recognition (A binds to U and C binds to G). The ribosome moves along the mRNA, matching 3 base pairs at a time and adding the amino acids to the polypeptide chain. When the ribosome reaches one of the “stop” codes, the ribosome releases both the polypeptide and the mRNA. This polypeptide will fold into its primitive conformation waiting for additional post-translational modifications.

As in Transcription, Initiation, Elongation and Termination are the steps involved in translation.

  1. The ribosome binds to mRNA at a specific area.
  2. The ribosome starts matching tRNA anticodon sequences to the mRNA codon sequence.
  3. Each time a new tRNA comes into the ribosome, the amino acid that it was carrying gets added to the elongating polypeptide chain.
  4. The ribosome continues until it hits a stop sequence, then it releases the polypeptide and the mRNA.
  5. The polypeptide forms into its native shape and starts acting as a functional protein in the cell.


The new polypeptide is now floating loose in the cytoplasm or bound to a membrane if synthesized in the Endoplasmic reticulum. This will fold spontaneously into their active configuration, or other molecules are also attached to the polypeptides: sugars, lipids, phosphates, etc. All of these have special purposes for protein function and come from Post Translational Modification performed by appropiate enzymes.