Dr. DNA Dan - A Genetics & Genomics Podcast

Ancient Artifacts Inside Your DNA

DNA Dan Season 2 Episode 1

In this episode, Dr. Dan Handley takes us into the fascinating world of proteins—those powerhouse molecules that do the heavy lifting in our bodies. While the human genome contains about 20,000 protein-coding genes, these genes produce over 100,000 different proteins, thanks to nature's incredible complexities. Dr. Handley breaks down the science behind how our genetic code expands to create such diversity in proteins, revealing the intricate processes that keep our bodies running smoothly.


Tune in each month as DNA Dan dives into the evolving world of genetics, from ancestry and health testing to genetics in mental health and forensics. With over 30 years of experience in genomics, SCU Professor Dan Handley, M.S., Ph.D., brings unique insights into how genetics shapes our lives and the future of precision medicine.

Please visit our website for more information about Southern California University of Health Science's Master of Science Program in Human Genetics & Genomics. https://bit.ly/SCU-DNA_Dan

I’m Dr. Dan Handley, professor of human genetics and genomics at Southern California University of Health Sciences. This is a podcast about all things related to human genetics, genomics, and the future of precision medicine.

In the last episode, I discussed alternative splicing of messenger RNA and how it was nature’s way of making a lot of different proteins from a single gene. I also referred to it as nature’s mad scientist, because when you think about it, it is pretty weird. I also promised you that as we go further into the human genome, things get even weirder.

Today I’m going to talk about a major feature of the human genome.

We tend to talk a lot about genes when discussing genomics, but only about one percent of the entire human genome actually codes for proteins. That’s remarkable and pretty mysterious, if you ask me. For a long time, geneticists referred to all this non-coding DNA as so-called “junk DNA.” You may have heard of that term. Well, as research progresses, scientists are finding out that this non-coding DNA is anything but junk, and much of the time it has important functions. We are only beginning to find out what all of this non-coding DNA is doing.

Now for the, at least for me, the mind-blowing part. About half of our genomes is what is called repeat DNA. In other words, about half of our DNA are just repeated sequences, over and over. If you recall, DNA sequences are made out of four nucleotides which I’ll refer to by their abbreviations As, Cs, Ts, and Gs. DNA repeats can mean repeats of two nucleotides, such as ATATATAT and so forth, or repeats of thousands or millions of nucleotide sequences.

Many repeats of DNA sequences are directly next to each other, forming a chain. These are called tandem repeats. Remember the term tandem repeats, because our knowledge of them is used in a variety of ways, including forensic DNA testing. But there are other repeats, or identical sequences of DNA, spread all over the genome. These are not surprisingly called dispersed repeats.

Some repeats are identifiable as duplications of parts of chromosomes or genes. Most of these are inactive, meaning that they don’t produce a protein gene product. But we see these repeated DNA sections in everyone who gets their genome sequenced, so we know they’re quite ancient. 

Some of these non-functional genes are called pseudogenes, because they have most - but not all – of the features that allow them to be transcribed into messenger RNA and then translated into proteins. These also appear to be ancient artifacts in our genomes since there are so many of them. It’s as if some genes were replicated and inserted back into the genome many times, but incurred some kind of mutation that rendered them inoperable. A bit spooky, if you ask me.

But that’s not all. Our genomes also contain what is called mobile genetic elements. Many of these are called transposable elements, or transposons. Some people just call them jumping genes. These pieces of DNA can leave part of a chromosome and insert themselves somewhere else in the genome. 

I bet you didn’t know that our genomes aren’t fixed, but have these mobile elements swapping themselves around in our DNA. A high proportion of these transposons seem to have lost their ability to break out of the genome and re-insert themselves. These are fixed transposons, and they account for a lot of the repeat DNA in our genomes. There are about 300,000 fixed transposons in the human genome. Based on studies of other primates, they are estimated to have been fixed into our genomes about 37 to 40 million years ago. That’s truly some ancient, or maybe we should say fossil DNA.

The transposons that move are called autonomous transposons because – well, because they act on their own. They code for all the proteins necessary to cut themselves out of the genome and insert themselves in somewhere else. They are a bit of a health concern, because although very unlikely, they can cut and paste themselves out of or into a gene. Over 100 cases of genetic diseases have been traced to jumping gene insertions, including hemophilia, Duchenne muscular dystrophy, and certain cancers.

Now, there is also a separate type of transposon, called a retrotransposon. These are ancient remnants of viral infections in our distant ancestors dating from millions of years ago. 

They get their name from their similarity to infectious retroviruses. There are not many of them that infect humans. The most prominent retrovirus that infects humans is HIV. Retroviruses contain single stranded RNA. When a retrovirus gains entry into a cell, it uses a special enzyme to convert its RNA into DNA. This DNA that contains the virus’ own blueprint then sneaks into the human cell’s DNA and becomes a permanent part of the cell’s genome. 

When the DNA in the cell is transcribed into RNA and then translated into proteins, it not only produces its own normal protein products, but viral proteins as well. The virus hijacks the cell’s gene expression molecules to produce its own proteins. Quite sneaky. These viral proteins self-assemble into new virus particles. The infected cell now acts as a viral factory as it starts producing thousands or millions of new viral particles, which then are released from the cell to spread and infect more cells.

Many retroviruses managed to get themselves inserted and copied into the human germline millions of years ago but lost their ability to infect cells and so remain non-infectious retrotransposons. These are features all of us share in our genomes. These retroviruses are not only non-infectious but have lost their ability to be autonomous, which is fortunate. There are about 400,000 copies of an ancient retrovirus remnants in the human genome. They are called Human Endogenous Retroviruses or HERVs. They make up about 8 percent of our entire human genome, which is pretty significant.

But there is an even older class of ancient retrovirus remnants in not only human DNA, but all mammalian DNA, too. These are called, and I’ll spare you the details about the name, Mammalian Apparent LTR retrotransposons. The letters LTR refer to “long terminal repeats” which are long repeats of two nucleotides at both ends of the retrotransposon sequence. These retrotransposons, being found not only in humans but many other mammals strongly suggest they are far more ancient in origin than the retrotransposons found only in humans. And they are numerous, with about 300,000 copies of them in the human genome.

Interestingly, retrotransposons are normally transcriptionally inactive, meaning they don’t produce messenger RNA or protein. However, as we age, and in certain disease states, retrotransposons can start becoming transcriptionally active. The significance of this in human health and the process of aging is being actively investigated by scientists. And yes, going into far more detail about that will be the topic of a future podcast.

Getting back to the topic at hand, there are dozens of categories of repeat DNA. It’s really amazing, and going over all those different categories is a bit overwhelming. But I’ll leave you with just one more type of repeat. These ancient DNA elements are called Alu sequences. Why are they called Alu sequences? Alu is the shortened name for Arthrobacter luteus, from which the enzyme was derived that was used to discover these sequences. That happens a lot in science, and particularly biology. Things are named after how they were first discovered, which is one reason why a lot of scientific terminology gets confusing.

In any case, Alu sequences are about 300 base pairs long. There are about one million of them dispersed around the human genome. The vast majority of them are fixed, or non-autonomous. But there are a few that are autonomous and do jump around within the human genome. As all jumping DNA sequences, they have the possibility of initiating cancer in a cell, so research in this area is very active and very important as well.

Stay tuned for the next episode where, yes, I have some more repeats to talk about. If you’re a fan of crime scene dramas and interested in the scientific basis for forensic DNA tests, you’re not going to want to miss this one.