Imagine if software engineers wrote some code, then had to wait two or more weeks before they could execute that code on a computer. Imagine if, every time they needed to edit that code, it would take an additional four to fourteen days just to run the newly edited code. The world would be a far different place. These are the status quo timelines in molecular biology research.
Plasmids — short, circular pieces of DNA — are the underpinnings of modern biological R&D. Plasmids are the letters scientists write to cells. They encode genes and other genetic elements used to manipulate cellular function. Every new vaccine, antibody, CRISPR gene editing technique, or cancer cure starts with the humble plasmid, or rather hundreds to thousands of plasmids. Talk to any wet-lab molecular biologist, and they’ll probably say that their workday involved designing, creating, mutating, transforming, or analyzing data from plasmids.
electron microscope image of a plasmid, visual abstraction of a plasmid which encodes several genetic elements, and a zoomed-in snippet of plasmid to demonstrate double stranded DNA
Despite this ubiquity in modern biology, plasmids take too long to make. And you’re never making just one; many projects involve tens, hundreds, even thousands of plasmids. Consider what it was like to program computers in the 1970s: scientists used holes punched in paper cards to manually communicate a function to a computer. It was painstakingly slow. Analogously, to communicate a function to a cell, scientists have to manually assemble fragments of DNA to form a plasmid and input that into a cell. Better methods of information storage — magnetic disks and semiconductors — made punch cards obsolete and massively advanced computing. A similar overhaul in plasmid construction techniques is critically needed.
It is important to note that, similar to punch cards, synthesizing DNA can be really cheap monetarily — seven cents per base. The problem is the time expense.
Consider these two anecdotes:
Plasmids, and access to them, were essential for rapid research on Covid-19.
While plasmids enabled every step of the decades-long development of mRNA Covid-19 vaccines, we will focus on a narrow anecdote of how rapid access to a single plasmid rippled through labs across the world.
Pseudotyping is the process of producing a virus with foreign viral envelope proteins. It allows researchers to analyze an aspect of a dangerous virus, in a safer, and easy-to-work-with virus — like analyzing the mechanics of a bomb, without the explosives. During the pandemic, pseudotyping the SARS-CoV-2 spike protein onto another virus was absolutely essential as most research labs (which are Biosafety Level 1 or 2) were not equipped to work directly with the SARS-CoV-2 virus (which is Biosafety Level 3).
In May 2020, Jesse Bloom’s Lab published a protocol to pseudotype the SARS-CoV-2 spike protein onto lentivirus, which is very commonly used in biological research. Bloom then deposited the plasmid used in the study onto Addgene, a global repository for plasmids. This enabled researchers from around the world to order the plasmid — as opposed to designing and building it for themselves from scratch — and use it in conjunction with other plasmids to produce the pseudotyped spike protein in lentivirus. Researchers could study the spike protein, study neutralizing antibodies, and assess vaccine efficacy all in a safe environment. While tens of thousands of plasmids likely went into developing the vaccine, access to just this one plasmid notably accelerated Covid-19 research everywhere.
Previous to the Covid-19 pandemic, Addgene had just “a handful” of coronavirus plasmids, now the repository has more than 2,400 in the collection and has received over 13,000 requests across 75 countries for such plasmids.
Plasmids aren’t just used in single-finger-poke inquiries ― libraries of hundreds of plasmids are used to test many possible designs. Consider the work of Han Spinner, a graduate student of Biological and Biomedical Science at Harvard. Han is creating a tool that scans a patient’s transcriptional landscape and changes the mRNA present in the cell. Such a tool could be used to induce apoptosis in certain cancers, for example.
To create such a tool, Han and their collaborators are evolving a newly-discovered CRISPR protein that targets mRNA. In the same way that humans have steered the genetics of a wolf towards a Border Collie, we can steer the DNA that codes for naturally occurring CRISPR proteins towards a disease-preventing molecular machine, but on the timescale of a graduate degree, rather than tens of thousands of years. This is done with plasmids.
Han uses information from scientific literature and from three different kinds of machine learning models, including one trained on protein structures, one trained on related CRISPR protein sequences, and one trained on the whole universe of known proteins to prioritize sites in the CRISPR protein’s sequence that might evolve it towards the desired function. Using a method called Site Saturated Mutagenesis, Han and their collaborators produce 94 mini-libraries, each with nineteen variants — nineteen is the number of possible amino acid swaps — for a total of 1,786 unique plasmids that code for each variant (plus a couple for experimental controls). Han and their team will put these roughly 1,800 plasmids into cells and assess how each resulting protein variant performs.
You can think of this as if 94 breeders each had litters of 19 puppies, and now they must select which puppies are the most improved at herding sheep in comparison to last year’s best-in-show (aka the wildtype protein), and breed those. But here, herding sheep is the desired molecular function and the puppies are CRISPR proteins.
Spinner’s work demonstrates a shift in how scientists are developing biological therapeutics. The future of precision medicine rests on our ability to rapidly screen thousands of protein variants and select the one with the most potent therapeutic potential. This extends beyond just CRISPR-based therapeutics and into any other biologic, like hormones — long-acting insulin, for example — monoclonal antibodies, even to the new field of de novo protein design. All of this work is only possible because we know how to make and use plasmids.
During our conversation, Han shared an insight that felt particularly poignant to the problem of plasmid production: the value of momentum — keeping the ball in the air. Science is an inherently human practice. Whether or not we care to admit it, projects ebb and flow, or sink entirely, based on the momentum and motivation of the scientists executing on the science. In the iterative cycles of evolving a protein, computational biologists, like Han, will be juggling all the exciting details of the project — ideas, nuances, caveats, and context — their head is in the game — then they toss each ball to their experimentalist counterpart. Then weeks pass by, maybe months if the library of variants is large and difficult to assemble. Han posits, “Imagine if, me, a machine learning person, could actually send someone a list of variants and then next day, or two days later, have feedback from them, keep that momentum, and keep the ball in the air.”
Accelerating plasmid production will accelerate every iterative design-build-test cycle in synthetic biology it will not only make research go faster, but it will enable better research because the most critical component — human minds — are no longer dedicating their brain power, creative energy, manual labor, and time to plasmid production.