Good Designers Borrow

Creating A Visual System For Modified RNA

Po Bhattacharyya
Benchling Engineering

--

In this post, Po, a designer at Benchling, narrates how he helped build a design tool for RNA scientists working with chemically modified sequences. Read on to learn more!

In the summer of 2020, as the COVID-19 pandemic threatened to swallow the planet, Martin, a molecular biologist at a London-based startup, sat squinting at Microsoft Excel on his computer. That’s the software he used to map out his research, which focuses on modified RNA, the same technology that would give rise, in just a few months, to the life-saving vaccines developed by Pfizer and Moderna. His own work involved cancer, specifically the development of treatments that would restore normal function to tumor cells. It was an approach that flew counter to the vast majority of treatments on the market, options such as surgery, chemotherapy and radiation therapy, which aim to cure cancer by killing diseased cells rather than healing them.

Martin’s research pushed at the bleeding edge of science, but the software he used struggled to keep up with it. In the absence of more advanced solutions, an Excel spreadsheet had become his de-facto design tool for the RNA sequences he wished to assemble. He spent long hours using the tool each week. The visual system he developed was meticulous: three separate cells for each distinct unit of a sequence (more on this later), a custom color for each chemical modification, custom formatting as well. To the well-trained eye, the results were impressive: RNA sequences, those long, tiny, formidably complicated molecules, could now be seen as human-readable cells on a spreadsheet.

Fig 1. An example of what Martin’s spreadsheets looked like. The name “Martin” is a pseudonym. More, his data have been jumbled and anonymized to protect his intellectual property.

As ingenious as Martin’s system was, it suffered a few significant drawbacks. During our first conversation, he described these in considerable detail: how working in Excel required a lot of painstaking copying and pasting; how there was no way for him to programmatically detect and prevent errors; how it was incredibly difficult to search through his work as the volume of data burgeoned with time. The same theme undercut all the issues Martin brought up: Excel is a general-use, industry-agnostic tool that isn’t backed by any sort of biological awareness. This made his visual system impossible to scale.

In other words, it was the software, not the complexity of the science, that threatened to become the bottleneck in Martin’s research. And it wasn’t just Martin. As RNA design tools went, the playing field was nearly empty. Many of the scientists we spoke with reported the same problems in their own workflows — data integrity, error prevention, and scale — the kinds of problems that Benchling has amassed some experience tackling: DNA in 2013, proteins in 2016. In late 2020, we decided to take on RNA as well. I was the designer on the project. “I’ve never done anything like this before,” I remember thinking, which is a thought I have before every project. After that, I got to work.

You’re probably living under a rock if you’ve never heard of RNA. What used to be DNA’s lesser-known cousin is now being hailed as a promising new technology in medicine. Part of what makes RNA so opportune is that it’s already woven into our bodies’ natural processes. Our cells use RNA to understand what they should do next. For example, if there’s RNA floating around that instructs a cell to build a COVID spike protein, the cell will comply. After that, our immune system will detect the presence of that spike protein, make antibodies to target it, and suddenly there’s an army of antibodies ready to aid the body’s battle against COVID. That’s basically how Pfizer’s vaccine works. Moderna’s as well.

The mechanisms underlying RNA-based vaccines have been known to science for many decades. The hard part is getting that RNA to float around in our cells in the first place. RNA is pretty flimsy. It is usually broken down by the body in a matter of minutes. Injecting RNA into people’s bloodstreams, as a vaccine, say, would destroy it long before it could even get to the cells. Through much of the 20th century, the transience of RNA was deemed an intractable problem. It was only in the 2000s that science finally awoke to the solution: the rather innocuously named modified RNA.

The “modified” qualifier comes from the presence of chemical modifications that diverge from the composition of naturally occurring RNA. Some modifications make the RNA stronger, so it doesn’t get destroyed by the body quite as quickly. Other modifications make the RNA more performant, allowing it to enter cells with greater ease, say, or get those cells to produce greater quantities of the spike protein.

The ability to dial up the stability and specificity of RNA is a crucial innovation, part of what makes the COVID-19 vaccines so effective. But designing modified RNA is tedious work, and that’s where the software comes in. At its best, software can streamline biotechnology, simulating biological systems to the degree of complexity the scientist desires. That’s part of the whole value proposition of Benchling. In the realm of modified RNA, our goal is to serve as both a design tool and a system of record, allowing scientists to create their sequences with greater ease while collaborating more effectively with colleagues along the way.

As the RNA-design-tool project got underway, I was tasked with developing a visual system to represent the chemical modifications in RNA at scale. I started by speaking to scientists such as Martin — people who had developed home-grown solutions to meet their needs. Everyone had hit upon a slightly different solution, one that worked for them, but only sort of. In one instance, I came across a drug manufacturing company that had assembled a software-engineering team whose sole mandate was to build out an in-house design tool for modified RNA.

You can think of an RNA sequence as a series of beads on a string. In naturally occurring RNA, each bead has three component molecules: a sugar called ribose, a phosphate, and one of four nitrogenous bases: G, C, A, or U. Each component molecule can be modified in several viable ways, and significant trial and error is involved in arriving at the appropriate combination of modifications. Along the way, it is important for scientists to visualize the sequences they’re working with in terms of both the location and type of each modification. A question they often need to answer at a glance: which beads are chemically modified, and how?

After a round of exploratory conversations, it was time to turn to the scientific literature. In a 2017 paper by Khorova and Watts, RNA sequences are depicted as rows of circles connected via arcs. The color of the circles represents sugar modifications, while that of the arcs represents phosphate modifications. In another paper, Chernikov et. al. (2019), a pared-down version of the same visual shows up. In this instance, the arcs appear only when a phosphate modification exists; when a particular bead includes the phosphate’s natural form, the arc vanishes.

Fig 2. Left, a (grainy) visualization from Khorova and Watts (2017). Right, a (slightly less grainy) visualization from Chernikov et. al. (2019).

“Good artists borrow, great artists steal,” Picasso may or may not have once said. As an aspiring good artist, I borrowed liberally: the colors, the shapes, the formatting, all of it. For a while, the going was ugly. I played with dots above letters, lines below them, fills and strokes and boxes and shadows, the funkiest of the funky shapes. That’s how design works, really. You throw the colander of pasta skyward, hoping, praying that a noodle will stick. In a way, that’s how science works as well. The last idea standing — that’s the one you go with.

Fig 3. Some examples of early explorations.

On this occasion, though, nothing stuck. Not for a while, anyway. When I presented the above ideas for critique, I was told again and again that the visualizations weren’t informative enough. The ideal solution would maintain the integrity of the RNA backbone (i.e. the bases) while also communicating the location and type of each modification. The dots weren’t working, and neither were the highlights, the chapeaus, or the indents. I should try something different, I was gently advised, even as I scraped the bottom of the barrel for inspiration.

Where do good ideas come from? I do not know. All I remember is the precise moment at which lightning finally struck. I was perusing The Visual Display of Quantitative Information by Edward Tufte, staring at a graphic depicting Napoleon’s disastrous invasion of Russia in 1812, when it suddenly hit me: what if my ideas weren’t bad so much as insufficient? What if I could find a way to combine them somehow, to say more with less, to fuse a bunch of different concepts into one?

I got to work right away. After a day of feverish noodling, what I ended up with was this: a compositional system made up of hollow rings, superscripted diamonds, and filled circles with a letter inside each. Accordingly, four separate dimensions are used to convey four different pieces of information. The text spells out the base; the background indicates whether that base is modified; the outline describes the sugar; and an added shape, off to the top right, describes the phosphate.

Fig 4. An image showing the eight possible beads in an RNA sequence, from the left: (1) no modifications; (2) modified phosphate only; (3) modified sugar only; (4) modified base only; (5) modified sugar and phosphate; (6) modified sugar and base; (7) modified base and phosphate; (8) modified sugar, base, and phosphate.

The next challenge was how to extend this visual system in order to show even greater detail at the current resolution. For instance, how might one be able to tell different types of sugar modifications apart? This time, the answer seemed obvious: the tool would allow scientists to customize their visualizations within well-defined constraints. Accordingly, they might pick unique stroke colors for different sugars, unique fill colors for different bases, and unique shapes for different phosphates. Here’s an example of what a user-customized version of the visualization looked like:

Fig 5. A user-customized version of the visual system showing custom colors and shapes for modifications.

Finally, there was the question of scale. How might our visualization preserve the information it communicates even as a user zooms out to a larger number of beads? Here, too, the solution flowed naturally. A zoomed-out view would necessarily cause the letters of an RNA sequence (A, U, G, C) to shimmy up against each other. This meant that the modification information would need to move either up or down while remaining aligned to the particular bead it described. This idea fulfilled the need for consistency as well: if we could pull it off, the legend wouldn’t need to change very much at all as the user zoomed out.

Fig 6. A zoomed-out version of the visualization showing shapes and colors above the sequence instead of on it.

Benchling shipped the first version of its RNA design tool in the summer of 2021. It took the whole village: crackerjack engineers, a visionary product manager, diverse members from the field organization, plus timely assists from a hundred other kind and brilliant brains. All through the development process, we also worked closely with an advisory group of RNA biologists, including Martin. We relied on them for feedback and guidance, and we counted on their patience as well — working with Silicon Valley know-it-somes such as myself tends to require oodles of that.

When the tool finally came out, it caused a bit of a splash in the bioinformatics microcosm. Some lauded it on Twitter. Others agreed to be quoted directly on the brand new RNA therapeutics page of benchling.com and on the company blog as well. For me, though, the ultimate signal of our legitimacy was that Martin abandoned his home-grown visual system and switched over to Benchling’s tool at once. He encouraged his colleagues to do the same. Earlier this year, when he changed jobs, he took Benchling with him, persuading his new employer to adopt our tool as well.

It goes without saying that I’m proud of the tool we’ve built. But the work isn’t done, not even close. The science of modified RNA changes every day; it crawls and tiptoes and sprints and leaps. According to The Atlantic, the technology is being put to work against a cornucopia of diseases: not just COVID-19 and cancer but multiple sclerosis, malaria, and the seasonal flu as well. A GlobalData analysis identified over two dozen RNA-based vaccines and therapies that are scheduled to enter clinical trials this year. These developments make great demands of the software that supports them. Here at Benchling, we are always playing catch up. Will we ever get there? Probably not. And that’s a good thing.

We’re hiring!

If you’re interested in working with us to build the future of biotech R&D platforms, check out our careers page or contact us!

--

--