Detailed information on the composition of the covid-19 vaccine
Matej Huš: Dec 26, 2020 at 22:48; Science and technology
It sounds simple, especially if we translate it to computer language. The covid-19 vaccine contains the source code for part of the virus, namely for protein S, which sticks out of its envelope. When we insert this source code into the body, our cells produce copies of this protein. The immune system encounters it, recognizes it as foreign and prepares a response, thereby protecting us from all foreigners that have such a protein exposed on the surface - currently SARS-CoV-2 is relevant. This is how modRNA-based vaccines (modified messenger mRNA) work. Today we'll look under the hood, what exactly the Pfizer and BioNTech vaccine contains, with which they will start vaccinating the most at-risk groups in Slovenia tomorrow.
The vaccine contains the active ingredient, which is modRNA with the appropriate genetic sequence, and excipients. Let's look at the latter first. The European Medicines Agency (EMA) in the approval also lists excipients (page 9), which are water, sucrose (table sugar), sodium chloride (table salt), sodium hydrogen phosphate dihydrate (Na2HPO4 . 2H2O), potassium chloride (KCl) and potassium dihydrogen phosphate (KH2PO4). These are salts normally present in body fluids, and they are in the vaccine so that it has the appropriate ionic strength so as not to cause osmotic shock in the cell. Cell membranes are permeable to water, which passes from higher concentration to lower, so water without dissolved salts enters cells and causes them to burst. In practice we all know not to drink large amounts of distilled water and that infusions must be physiological solution and not water. In addition, the vaccine also has cholesterol, DSPC (1,2-distearoyl-sn-glycero-3-phosphocholine), ALC-0159 (2-[(polyethylene glycol)-2000]-N,N-ditetradecylacetamide) and ALC-0315 (((4-hydroxybutyl)azanediyl)bis(hexane-6,1-diyl)bis(2-hexyldecanoate)). Despite the complicated names, these are relatively simple fat molecules. Cholesterol is a fat with a sterane structure, which in the vaccine has little to do with the notorious protein complexes LDL and HDL, which we usually measure when talking about cholesterol. These are larger protein particles that carry multiple fat molecules around the body. The task of fats in the vaccine is to protect the genetic material in the vaccine, as they create fatty nanoparticles in which the modRNA is trapped.
The world and biology are not as spectacular as Hollywood movies would like to show us. If we naively injected foreign RNA or DNA into the body, we wouldn't produce any mutated organism. The immune system would simply chew up the foreign genetic code in collaboration with enzymes without us even noticing. This happens to us daily. With eating we bring huge amounts of foreign DNA and RNA into the body, and we don't even notice. Even when mosquitoes, ticks and other insects bite us in summer, we get foreign genetic material into the blood, which quickly degrades there (more problematic if they give us some parasite, bacterium or virus). The genetic material itself is very fragile and unstable, so viruses, these extremely simplified seeds of life (are they even alive?), still retain an envelope in which their RNA or DNA is wrapped. And so the genetic material in the vaccine must also have a fatty envelope to be delivered to cells and enter them.
When the vaccine's genetic material enters the cell, it stays in the cytoplasm. It cannot be transferred to the cell nucleus, but in reality we don't even need it there, since only the main cellular DNA is there. In every genetic material there is a code for protein production, which is produced on ribosomes. They read messenger RNA, which we can imagine as copies of relevant parts of DNA. Appropriate enzymes first read the DNA in the nucleus and transcribe it into mRNA, which then travels from the nucleus to the cytoplasm. Ribosomes are not too picky. Every mRNA (properly shaped, see below) they find, they happily translate into proteins. And the vaccine slips them such RNA, which of course didn't come from cellular DNA, but that doesn't bother the ribosome at all.
Now let's see what the vaccine slips them (BNT162b2 or Tozinameran). The entire sequence has 4284 nucleotides. A single nucleotide contains bases, sugar ribose and phosphate group. This is the basic building block of RNA (and also DNA if the sugar is deoxyribose). While ribose and phosphate group are the same in each nucleotide and form the backbone to which bases are attached, the latter differ. In DNA they are adenine (A), guanine (G), cytosine (C) and thymine (T), in RNA uracil (U) performs thymine's function. In 30 micrograms of RNA present in the vaccine dose, everything we need for defense against the virus is encoded.
In these 4284 characters there are 8568 bits of information (there are four options for each base, so we need two bits for its record), so about 1 kilobyte. The entire virus has about four times more information. The genetic code is read in triplets. The reason is simple - with four letters we want to encode at least 20 different words, as many amino acids as build proteins. This means we can afford some redundancy. There are 64 three-letter options (we call them codons), but only 20 amino acids. Nature wasn't equally fair here, since tryptophan is encoded by only one sequence (UGG), so every point mutation causes amino acid substitution, while leucine has six options (UUA, UUG, CUA, CUC, CUG, CUU). This will be important later.
The entire genetic code in the vaccine is published, the same applies to protein S in the virus (and the entire virus). Just like in computing, in genetics the code contains certain headers, signposts, orientation data etc. It starts with nucleotides GA, which represent the cap at the 5' end (RNA chain ends are marked 5' and 3', because cyclic ribose sugars as backbone are linked at positions 3 and 5). This cap is essential for mRNA to be active at all, similar to how scripts in Linux have the #! declaration at the beginning. Followed by a part that is not transcribed (UTR - untranslated region). This is essential because the ribosome cannot start reading from the beginning, as it must physically attach to the mRNA, thereby blocking some initial nucleotides itself. Transcription starts after the AUG codon, which marks the start of the protein. The first shorter part is the signal peptide - this is a shorter part of a few amino acids that are not part of protein S, but have a directing function. They are attached to the beginning of the resulting protein and contain information on what the cell should do with it: whether it should be further modified (post-translational modifications), e.g. attach a methyl group to certain amino acids, where it should go etc.
So after two nucleotides of the cap, 52 nucleotides that are not translated, 48 nucleotides are translated into signal protein (with 16 amino acids), then follows the code for protein S (nucleotides 103-3879). If we compare this code with the code for the same protein in the actual virus, we notice quite a few differences, so this RNA is called modified (modRNA). The first obvious difference is the absence of uracils. Instead of letter U there is Ψ in the code, which represents 1-methyl-3'-pseudouridyl. The reason is quite clever. Foreign RNA in the body is not very long-lived, as the immune system destroys it. It turns out that this substitution convinces the immune system that the RNA is not dangerous, so it leaves it alone, while in translating RNA to proteins this substitution causes no problems, as ribosomes still read it normally as U. Therefore, all U in the vaccine are consistently replaced by Ψ. The second change is replacement of some A and Ψ/U with G and C when it doesn't affect the amino acid encoded by the codon (multiple codons encode the same amino acid, as we saw above). In double-stranded DNA this results in greater stability, since there are three hydrogen bonds between G and C, two between A and T. In RNA the chain is single, but these modifications are still beneficial because they increase translation and synthesis efficiency. All these changes have no effect on the final product - the same protein S as the virus has is still synthesized. We call these synonymous substitutions.
A detailed review shows that two substitutions are different. Two codons are changed so that instead of valine and lysine they code for proline. This is necessary so that the protein S produced by the cell has the same shape as the one on the virus. The immune system recognizes the shape, so it must be as identical as possible. If we produce protein S in the cell and leave it alone, it collapses into a different shape from the one attached to the viral envelope. The immune response would thus learn to recognize a protein that is not the same as the viral one, which would be useless. With this substitution, however, we cause the protein S that is alone in solution to have the same shape as the viral one. It's no coincidence that proline is the key, as it's the only amino acid with a cyclic structure that has no primary amine group but secondary. In practice this means it's much more rigid. If we insert it in the right place, it "locks" the protein structure.
At the end follow two UGA codons, which are stop codons - there the ribosome stops protein synthesis. Followed by a part that is not translated (3'-UTR), but it must be there. There are several functions, many we don't know yet, but it certainly affects polyadenylation, translation efficiency, stability of the mRNA itself and location where it is. mRNA ends with a sequence of A, which protects against degradation. With each use some fall off, so there is an optimal value of how many A mRNA should end with for optimal gene expression.
In summary: The vaccine is an aqueous solution of some salts that ensure proper pH and ionic strength so cells don't experience shock, and a bunch of fats that hide the mRNA genetic code so it can enter cells without being destroyed. The mentioned mRNA stays in the cytoplasm in the cell, where ribosomes produce the same protein S from it as the virus has on the surface, so the immune system then learns to respond to it. To make the vaccine more effective, the mRNA in the vaccine is not exactly the same as in the virus. Instead of nucleotide U (uracil) it uses a synthetic analog to avoid destruction by the immune system, and also has substitutions that don't affect the expressed amino acids but ensure more efficient synthesis. At two places amino acids are substituted to ensure the shape of the solitary protein S remains the same as on the virus. Because of these modifications we speak of modified messenger RNA (modRNA). The RNA itself contains an initial cap, untranslated part, signal oligopeptide, relevant part for translation into protein, untranslated end part, and polyadenylated protection against degradation. In the cell mRNA of course doesn't stay long, but quickly degrades.