"JunkDNA" (98.7% of DNA in human) is not "Junk" - requiring a generalization of the "Gene concept". On http://www.junkdna.com website news items are posted (some of them reproduced here from http://www.junkdna.com/new_citations.html ) - to be discussed. My "two cents" is FractoGene (see similar website and upcoming book), a geometrization that has received now experimental support for its first prediction.

Thursday, July 14, 2005

GPS on the shovel when digging for gold?

[See full posting at http://www.junkdna.com/new_citations.html ]

Genomics study highlights the importance of "junk" DNA in higher eukaryotes

A landmark comparative genomics study appears online today in the journal Genome Research. Led by Adam Siepel, graduate student in Dr. David Haussler's laboratory at the University of California, Santa Cruz, the study describes the most comprehensive comparison of conserved DNA sequences in the genomes of vertebrates, insects, worms, and yeast to date. One of their major findings was that as organism complexity increases, so too does the proportion of conserved bases in the non-protein-coding (or "junk") DNA sequences. This underscores the importance of gene regulation in more complex species. The manuscript also reports exciting biological findings regarding highly conserved DNA elements and the development of a new computational tool for comparing several whole-genome sequences. .... Such approaches are particularly useful for analyzing non-protein-coding sequences - sometimes called "junk" DNA. Although "junk" DNA is poorly understood, the increasing availability of whole-genome sequences is rapidly enhancing the ability of scientists to ascertain the biological significance of these non-protein-coding regions. ...The vertebrates included human, mouse, rat, chicken, and pufferfish, and the insects included three species of fruit fly and one species of mosquito. .. the researchers developed a new computational tool called phastCons. ...The scientists also observed that the proportion of conserved sequences located outside of protein-coding regions tended to increase with genome length and with the species' general biological complexity.
Most strikingly, the researchers discovered that two-thirds or more of the conserved DNA sequences in vertebrate and insect species were located outside the exons of protein-coding genes, while non-protein-coding sequences accounted for only about 40% and 15% of the conserved elements in the genomes of worms and yeast, respectively. ... "These findings support the hypothesis that increased biological complexity in vertebrates and insects derives more from elaborate forms of regulation than from a larger number of protein-coding genes." ... Some of the strongest sequence conservation in vertebrates was observed in the 3' untranslated regions (3'UTRs) of genes, which indicates that post-transcriptional regulation may be a widespread and important phenomenon in more complex species. .."There really does seem to be a lot more going on at the RNA level than people would have guessed a few years ago," commented Siepel. ... some of the conserved elements may function as long-range transcriptional regulatory elements. ... Not only will the new bioinformatics tool phastCons help researchers identify evolutionarily conserved DNA elements, the reported conserved elements are represented as conservation tracks in the widely used UCSC Genome Browser. "With phastCons and with the conservation tracks in the browser," says Siepel, "we're trying to make it as easy as possible for researchers to home in on functionally important DNA sequences."

California Gold Rush Sold Shovels

[See the full posting at http://www.junkdna.com/new_citations.html ]

Dueling Databases

Can companies still make money selling genomic and molecular information?

[The reader can skip the article if lacking time. The answer is "NO money for data, $$$ for proprietary tools" - AJP]

BIOBUSINESS
Volume 19 Issue 13 Page 42 Jul. 4, 2005 By Ted Agres

Celera Genomics made hundreds of millions of dollars by selling access to its proprietary genome sequence information. But this month, Celera discontinued its database subscription service and made its 30 billion base pairs of genomic data of humans, rats, and mice freely available through GenBank, operated by the US National Center for Biotechnology Information.

Some see Celera's decision to exit the sequence business as proof of the adage that information wants to be free, and yet another sign that selling access to data is no longer a viable business model. ...

During the past few years some database companies (such as Incyte Genomics and Celera) have transitioned to drug discovery and development, while others (such as DoubleTwist) have simply gone out of business. Still, dozens of large and small companies worldwide continue to sell subscriptions to genome databases and molecular libraries, either alone or in combination with other services.

Some of these companies are information providers, such as the American Chemical Society's Chemical Abstracts Service (CAS) and Biobase, a commercial biological database vendor in Germany. Others, such as Integrated Genomics in Chicago and Inpharmatica in London, combine databases with proprietary software and other informatics tools to facilitate discovery of drug candidates.

Making a profit from research-generated data is not an easy matter, says Frank Allen, executive director of Cambridge Crystallographic Data Center, a nonprofit institute spun off from the University of Cambridge. "Some people sit back in their chairs and say, 'It's my divine right to use data that's in the public domain.' Well, it certainly is, but there's a price involved in turning that data into something that's usable. It's either going to come from the public purse or from subscription income." The CCDC maintains the Cambridge Structural Database, a repository of small molecule crystal structures.

Dozens of large and small companies worldwide continue to sell subscriptions to genome databases and molecular libraries. Their challenge is to find ways to maintain value amid growing competition from public sources.

"Science is moving towards greater openness in terms of data," says Eric Campbell, professor of health policy at Harvard Medical School. "The issue comes down to protecting one's competitive advantage. You have to have a way to uniquely profit from discoveries and prevent free-riders from hopping in at the end."

A PERISHABLE COMMODITY

Some companies, such as Biobase in Germany, are trying to increase value by curating, annotating, and extending the reach of their databases. Others, like the American Chemical Society's Chemical Abstracts Service (CAS), are attempting to maintain market exclusivity by keeping potential competitors at bay. Novartis and Perlegen Sciences, on the other hand, believe they will generate more business if they allow other researchers access to their proprietary databases. "We don't know if it's collaboration or competition or some combination that will drive science the fastest," Campbell says. "Nobody has studied it before."
Celera knew that its genomic information was a perishable commodity. "There is a time component to the value of information," says Tony Kerlavage, Celera's senior director of online business. In the company's early days, when Celera held a near-monopoly on human genome sequences, pharmaceutical companies and research institutions paid big bucks to access the raw data to locate novel genes and drug targets.

At its height, more than 200 institutions and 25 drug and biotech companies subscribed to the Celera Discovery System (CDS), paying annual fees ranging from thousands to millions of dollars, depending on the number of researchers. Over the years, the CDS was supplanted by such resources as GenBank and Ensembl – a project of the European Bioinformatics Institute and the Sanger Institute. Today, the CDS is useful primarily as a reference source, hence the company's willingness to place it in the public domain. Kerlavage declined to say how many subscriptions expired July 1, closing the service for good.

Three years ago, Celera's parent company, Applera Corp., decided to shift from information to drug discovery and development, and to selling gene expression arrays and diagnostic tools. The move may have been prescient. "If you have complementary services, it may be better to have your data freely available so you can sell more of those other services, whether they are machines or other things," says Arti Rai, a Duke University law professor who focuses on intellectual property in the life sciences.

..."There are structural changes going on in the dissemination of scientific information because of the Internet and because everything has become computer-readable. It's not the same sort of business it used to be," says Heller. "Either you adjust or you have problems."