Oncogenesis and Protein Folding

So this post is mostly going to be about/related to my project last summer at the University of Utah. I realize that my "audience" is probably not going to be biologists or biochemists, so I'm going to try and make this post as accessible as I can to those without that background. We're going to learn about cancer!

Really quickly, you will just need to know that:

  1. All living things are composed of cells
  2. Humans are multicellular organisms, but unicellular organisms exist

Overview and the Cell Cycle

In order to talk about cancer, we first have to talk about the cell cycle. For a multicellular organism, the above is a representation of the states each of its cells could be in. Every time a cell passes through M, a new, duplicate cell is produced.

Now, we can imagine in a healthy organism that this process would need to be heavily regulated, and indeed it is. We don't want cells dividing willy-nilly in our body, we need them to divide at a regulated rate or in response to some signals from other cells. Think about what happens when you get a cut -- we need lots of new cells to replace the ones that are no longer there! However, if there's no injury, it's unnecessary and even detrimental for us to be producing all these excess cells.

The loss of this regulation is essentially the first step of oncogenesis AKA the formation of a tumor.

Central Dogma

In order to talk about loss of regulation, we need to talk about the central dogma of molecular biology.

In simplest terms, the central dogma of biology states that "DNA makes RNA and RNA makes protein." This represents the basic flow of information in each of the cells in your body.

Each of your cells has long strands of a relatively stable molecule called DNA. In fact, each of your cells (with some exception) has the exact same strands as every other cell you have. DNA by itself does not do anything though, it simply encodes information. In order for cells to make protein, which we can think of as the machinery of the cell, it must first be transcribed into RNA. RNA is a molecule that is similar in structure to DNA, but is single stranded and much less stable. As such, it's fairly transient in a cell. This RNA molecule is then translated into protein. Protein serves many purposes, but for our discussion we will think of it primarily as a signaling molecule and as cellular "machinery".

The best analogy I'm able to think of is this: DNA is the library of the master copies of all the blueprints of all the possible machinery a cell can produce. It's risky to produce the machinery from the master copy (for reasons you're about to see!), so we make a temporary copy of it and use that as the basis for our machinery. It should be noted that, like our "blueprint copies", the machinery is also relatively short-lived in a cell and will eventually be degraded.

I'm going to try and stick to this "machinery" analogy when possible for the rest of this post.

Cancer is a disease of mutation, and you might be able to guess that it's DNA that's mutated. If we were to have "erroneous" RNA, we might produce some defective machines, but they won't last too long. Similarly, if we have an "erroneous" protein, we again just have a defective machine that isn't going to last very long.If we acquire an error in our DNA, then we've potentially modified the master copy of the blueprint of some machine and every one of these specific machines we make thereafter is going to be "erroneous." Even worse, every single one of this cell's descendants will have this same error.

So, back to a loss of regulation. Certain proteins signal or push the cell to go through the aforementioned cell cycle. These proteins are referred to as oncoproteins or oncogenes. We can think of them as the gas pedal for the cell cycle. Some other kinds of proteins do the opposite and pause/slow the cells' transition through the cell cycle. These proteins are called tumor suppressors and we think of them as the brakes on the cell cycle. So, looking at all of this, you might think that a defect-causing mutation in a tumor suppressor -- the "brakes" -- might result in cancer. And indeed, that's one way it can happen! But it's more likely for a mutation in an oncogene -- the gas pedal -- to be a driving mutation in cancer. This might seem odd, as we said that mutations cause defects, so a defect in an oncogene should mean the cell would move more slowly through the cell cycle, right? The explanation for this will bring us (finally) to protein 3D structure.

Regulation at the protein level

When we said that a protein pushes a cell through the cell cycle, I didn't give you the full picture. It's usually not enough for these proteins to exist in the cellular goop in order for them to function; they need to be switched on. A protein's function is determined by the chemical properties of your active site(s). For example, if you're a DNA slicing protein, then your active site has chemical properties that allow it to slice DNA. These chemical properties aren't easily changed, so how can a protein rapidly and easily turn on or off? One answer is three-dimensional structure "conformations".

We can think of many proteins as being chemical "globs." Any given protein has a certain three dimensional structure that is determined by its chemical properties. For example, it's energetically favorable for a positive and negative part of the protein to be close to one another, while energetically unfavorable for two positive or two negative portions of the protein to be close together. Certain parts of a protein may be favorable (for reasons) to be on the outside of the protein glob -- facing the cellular goop -- while others are favored to be internal, facing other parts of the protein. An example of a globular protein is shown.

Let's imagine we have a protein whose active site binds DNA. In its initial state, it's very neatly folded with lots of favorable interactions between the positive and negative parts. The active site is hidden at the center of the glob though, so it's unable to do its function at the present time. If, all of a sudden, one of these positive parts becomes negative or neutral, our favorable interaction now becomes an unfavorable or neutral interaction, causing our glob to take on a new shape. In this new shape, our active site is exposed, and now we can bind DNA. An example picture is shown.

This is one example of how a cell can regulate the activity of its proteins. By chemically modifying certain regions of a protein we can turn on or turn off certain activities.

Back to our imagined DNA-binding protein, let's say it's also an oncoprotein: when it binds DNA it increases transcription of other proteins that cause the cell to go through the cell cycle. If the encoding for it is mutated such that one of our normally negative regions is now neutral or positive, our active site is now permanently exposed and always on. We now have a cell with the accelerator pedal for the cell cycle that's been taped down! This single mutation in DNA has given us a loss of regulation of the cell cycle and thus tumorigenesis.

So....

So now we have some understanding of how a mutation in DNA can cause proteins to fold incorrectly, resulting in incorrect function. Where can we go from here? Knowing what mutations are likely to cause cancer is far from being a solved problem. We may have some genetic information about a mutation that's been acquired by an individual and want to ask "Is this individual now at an increased risk for cancer?".

My project this summer was to use a 3D structural prediction model to simulate what proteins with known-oncogenic mutations might look like in vivo. Using this data, our plan is to use machine learning methods in combination with known-non-oncogenic mutations to classify mutations of unknown oncogeneity. Our preliminary results suggested that oncogenic mutations in oncogenes became less stable, indicating that these mutated proteins were more likely to going through changes in three dimensional structure. Oncogenic mutations in tumor suppressor genes tended to result in proteins that were more stable and thus less likely to undergo conformational shifts, indicating that they may have become "locked" in an inactive state.

Further thoughts

There are some 250k encoded proteins in the genome, most of which have unknown functions. We may want to ask "Could this gene be an oncogene or a tumor suppressor?" Or given that we know a mutation in a gene is oncogenic, we may want to ask whether this gene is an oncogene or a tumor suppressor. If we can classify known oncogenes or known tumor suppressors by how they look when they are mutated, we might be able to answer open questions about the role of other proteins in the cell.