The Basics of DNA and Protein Synthesis

You probably have heard of DNA, or heard something like "it is in your genes" or other similar expressions. What would does this mean though? What is a gene, what is DNA, and how does any of that do anything to make a person?

To start off, let's try to get a big picture of what happens in protein synthesis.

What is a protein?

You likely have heard about protein in food, maybe seen videos or heard people talking about protein intake or getting enough protein in food, right?

A protein is just a chain of amino acids linked by peptide bonds, called a polypeptide that are folded into a specific configuration. That specific folding determines the function of the protein.

If you have ever seen a small child playing with a shape puzzle you get the idea that the square peg goes in the square hole.

Video: square hole video

In a way more complicated way a specific arrangement of proteins makes a hemoglobin shaped peg that can accept oxygen and carry it in your blood.

Hemoglobin structure
Image: Hemoglobin shaped peg

Another arrangement and folding produces a collagen shaped peg that provides structure to skin and bones.

Collagen structure
Image: Collagen shaped peg

Amino Acids and Peptides

So what determines how a protein folds? The order of the amino acids in a peptide. So what is a peptide? Just a strand of amino acids, you probably have heard of Ozempic or other GLP-1 drugs, which are just peptides and short ones at that.

So what is an amino acid? It is just a molecule with an amino group (NH2), a carboxyl group (COOH), and some extra side bits.

Amino acid molecular structure
Image: Amino acid structure

One important thing about amino acids is they contain nitrogen, and creating them means getting nitrogen from the air (Like N2 which doesn't like to mix with other things) in a process called nitrogen fixation. Humans and other animals cannot fix nitrogen ourselves. We get our amino acids by eating plants or other animals.

There are over 500 amino acids, but just 20 of them can be coded for in DNA. These 20 special amino acids are called Proteinogenic amino acids (n.b. Selenocysteine and pyrrolysine are the 21st and 22nd in some organisms, but they’re not part of the standard 20 and use special mechanisms), meaning they create proteins, neat right?

So proteins are just made from peptides which are made up of amino acids, and DNA tells us which amino acids go where. DNA stores the instructions. A gene is a specific segment of DNA that contains the code for making one (or sometimes more) proteins. Through the processes of transcription and translation (protein synthesis), the cell reads the DNA sequence and assembles the correct sequence of amino acids into a polypeptide (a really long peptide) chain. Once that chain folds properly, it becomes a functional protein.

Transcription and Translation Overview

So the two big things that take DNA into some proteins are Transcription and Translation. In transcription DNA is copied over into a intermediate form called RNA, there are actually a few different types of RNA used, but don't worry about that now. DNA you can think of as long term storage like your garage or a shed where you put holiday decorations until you need to put them out. RNA is more like a gallon of milk, where it only stores stuff for a bit between the cow (or plant) and you drinking it. Translation is the part where cells take the RNA and use it as a guide to assemble proteins.

So transcription DNA → RNA, translation RNA → protein.

The Structure of DNA

DNA (DeoxyriboNucleic Acid) is made out of 4 bases usually abbreviated as A, T, G, and C.

DNA Double Helix showing ATGC bases
Image: DNA structure with 4 bases

We call these nucleotide bases or nucleobases.

DNA is made of two strands which mirror each other, on one side if you have an A on the other mirror side you would have a T, or a C would be mirrored with a G. We often say the mirror versions are the ones they are "paired" with. So A pairs with T, G pairs with C.

Let's try a quick exercise.

Problems

Type the complementary DNA sequence for the given strand. Remember the pairing rules!

Given DNA Strand:

Reading the DNA

When we want to make a protein we start by unwinding the DNA so it is straighter, then we separate the two strands. DNA is long and you might wonder how we know which way to read it and how we know where to start.

In DNA we have a sense side and an anti-sense side, so if "ATG" is on the sense side, "TAC" would be on the anti-sense side. DNA also has a direction, one end we call the 3' end (which has a hydroxyl group (OH) at the 3' carbon of the deoxyribose ring) and one we called the 5' end (which usually contains a phosphate group attached to the 5′ carbon of the ribose ring).

When we transcribe DNA into RNA we read the anti-sense side from 3' to 5'. Both strands of DNA have both sense and anti-sense sections.

How do we know where a gene starts if there are just a bunch of A, T, G, and C's? Well one thing we look for what is called a TATA-box. A TATA-box is just a sequence like 5'-TATA(A/T)A(A/T)-3'. Then the gene is located just a bit down stream from there. Not every gene has a TATA-box, but usually there is a similar signal.

Every protein will start with a start codon which is just 3 bases ATG on the sense side (TAC on the DNA template side). A codon is just a set of 3 bases. An anti-codon is just the mirror version of a codon.

Making mRNA (Transcription)

A special enzyme called RNA polymerase attaches near a promotor region like a TATA-box then goes down the DNA anti-sense strand from the 3' to the 5' end and creates mRNA (messenger RNA, since it carries a message).

RNA (RiboNucleic Acid) is like DNA except instead of being a double helix that is mirrored it is just a long line. Instead of T or Thymine RNA has U for Uracil. RNA has a direction just like DNA but RNA polymerase makes the mRNA in the opposite direction that it is reading, so mRNA is made 5' to 3'.

When the RNA polymerase sees an A it makes a U, when it sees a C it makes a G just like the mirroring for DNA but with T swapped for U. In the end the mRNA looks just like the sense side DNA except it has U's where the DNA has T's.

Problems

Type the transcribed mRNA sequence for the given DNA anti-sense strand. Remember the pairing rules for RNA!

Given DNA Strand:

Building the Protein (Translation)

The mRNA leaves the DNA in the nucleus and travels to another part of the cell to get to a thing called a ribosome which takes the mRNA and uses tRNA to make a polypeptide which will become a protein.

tRNA is a cool little dude that has a mirror image of 3 RNA bases and carries an amino acid on its head. If the mirror version of the tRNA matches with 3 of the bases on the RNA it joins with it briefly and leaves its amino acid behind. The amino acids bond with each other with peptide bonds and then the tRNA floats off. AUG is usually the start codon on the RNA side.

Transfer RNA and protein synthesis
Image: tRNA and protein synthesis

If the protein is complicated it may need help folding and goes to a weird place called the rough ER (endoplasmic reticulum) and it gets folded there. If a protein needs to go to the rough ER, the entire ribosome carrying the mRNA will actually dock onto the outside of the rough ER while it is still making the protein. It feeds the growing amino acid chain directly inside the ER like a 3D printer pushing plastic into a box. Then the protein goes off where it is needed.

Genes don't just code for proteins, but I think it helps to look at protein synthesis first.

Problems

Use the codon table below to translate the given mRNA sequence into a polypeptide chain. Type the 3-letter abbreviation for each amino acid, separated by spaces (e.g., Met Pro Asp).

Given mRNA Strand:

Standard RNA Codon Table

1st Base U C A G
U
UUUPhe (Phenylalanine)
UUCPhe (Phenylalanine)
UUALeu (Leucine)
UUGLeu (Leucine)
UCUSer (Serine)
UCCSer (Serine)
UCASer (Serine)
UCGSer (Serine)
UAUTyr (Tyrosine)
UACTyr (Tyrosine)
UAASTOP (Ochre)
UAGSTOP (Amber)
UGUCys (Cysteine)
UGCCys (Cysteine)
UGASTOP (Opal)
UGGTrp (Tryptophan)
C
CUULeu (Leucine)
CUCLeu (Leucine)
CUALeu (Leucine)
CUGLeu (Leucine)
CCUPro (Proline)
CCCPro (Proline)
CCAPro (Proline)
CCGPro (Proline)
CAUHis (Histidine)
CACHis (Histidine)
CAAGln (Glutamine)
CAGGln (Glutamine)
CGUArg (Arginine)
CGCArg (Arginine)
CGAArg (Arginine)
CGGArg (Arginine)
A
AUUIle (Isoleucine)
AUCIle (Isoleucine)
AUAIle (Isoleucine)
AUGMet (Start)
ACUThr (Threonine)
ACCThr (Threonine)
ACAThr (Threonine)
ACGThr (Threonine)
AAUAsn (Asparagine)
AACAsn (Asparagine)
AAALys (Lysine)
AAGLys (Lysine)
AGUSer (Serine)
AGCSer (Serine)
AGAArg (Arginine)
AGGArg (Arginine)
G
GUUVal (Valine)
GUCVal (Valine)
GUAVal (Valine)
GUGVal (Valine)
GCUAla (Alanine)
GCCAla (Alanine)
GCAAla (Alanine)
GCGAla (Alanine)
GAUAsp (Aspartic Acid)
GACAsp (Aspartic Acid)
GAAGlu (Glutamic Acid)
GAGGlu (Glutamic Acid)
GGUGly (Glycine)
GGCGly (Glycine)
GGAGly (Glycine)
GGGGly (Glycine)
Start Codon (AUG) Stop Codons (UAA, UAG, UGA)

What Happens When Things Go Wrong? (Mutations)

We’ve talked about how DNA is the perfect instruction manual. But what happens if there's a typo in the manual? When the DNA sequence is changed, we call it a mutation.

Because the ribosome reads mRNA in blocks of 3 (codons), a mutation can have different effects. Imagine a sentence made only of 3-letter words, just like codons. Let's use: THE FAT CAT ATE THE RAT.

There are two main ways this can get messed up:

1. Point Mutations (Substitution)
This is when one single letter gets swapped out for another. If we swap the 'C' in CAT for a 'B', our sentence becomes: THE FAT BAT ATE THE RAT. It changed the meaning of that one word, but you can still read the rest of the sentence.

Sometimes, because multiple different codons can code for the exact same amino acid (check your table above!), a point mutation doesn't change the protein at all. The typo goes entirely unnoticed. We call that a silent mutation.

2. Frameshift Mutations (Insertion or Deletion)
This happens when a letter is randomly added or deleted. Remember, the ribosome strictly reads in blocks of 3. If we accidentally delete the 'F' in FAT, everything shifts over one spot to the left to fill the gap. Our sentence becomes: THE ATC ATA TET HER AT.

It is complete gibberish! Frameshift mutations usually completely break the protein because every single amino acid after the mutation is wrong.

Do point mutations actually matter?

They absolutely can! Sickle Cell Anemia is a genetic disease that affects the hemoglobin protein (the oxygen carrier we looked at earlier). It is caused by exactly one wrong letter in the DNA. A single point mutation changes the amino acid Glutamic Acid (Glu) to Valine (Val). That one tiny swap causes the entire protein to fold incorrectly, changing the shape of the red blood cell!

Normal vs Sickle Red Blood Cell
Image: Normal round red blood cell vs crescent sickle cell

Problems

A point mutation has occurred! Look at the original mRNA and its translation. Then, use the codon table to translate the mutated mRNA. Type the 3-letter abbreviations.

Original mRNA:
Original Protein:
Mutated mRNA: