|| Mary Edgerton is spearheading an effort to build a warehouse of linked databases. She hopes that linking
clinical and molecular information will yield answers about
the molecular mechanisms underlying disease.
team links clinical, molecular data
by Leigh MacMillan
Mary Edgerton erases the large red and purple cat her daughters
artwork and begins to fill her office white board with interconnected
circles, short dashes, and long arrows. Speaking quickly as she
draws, she explains how databases and computer algorithms can be
used to link, merge, and mine clinical information and related molecular
Thats the plan, anyway. And Edgerton and her colleagues are
well on their way to what she says some call a holy grail of bioinformatics.
The idea is to build databases that link our clinical information
and our molecular information and to do it in such a way that we
can search on several parameters across all the databases,
says Edgerton, director of the Molecular Profiling and Data Mining
Shared Resource of the Vanderbilt-Ingram Cancer Center. Everyone
wants to do this, and nobodys done it.
Edgerton and her colleagues envision a warehouse with multiple
databases one with an inventory of banked tissues, one with
the clinical information associated with each tissue sample, and
one with microarray and proteomic data. The linkage between the
databases comes from a unique identifier a barcode
assigned to a tumor tissue at the surgical pathology bench.
This barcode travels with the tissue as it is used for molecular
experiments, such as gene expression microarray or proteomic studies.
The tissue and clinical databases will be a tremendous resource,
Edgerton says. If, for example, an investigator says I
would like to know how many women between the ages of 29 and 35
developed node negative breast cancer between one and two centimeters,
that investigator will be able to search the database and find out
how much of that tissue we have stored. The banked tissues might
then be used for high throughput molecular analyses.
The team is using research in the lung cancer SPORE (Specialized
Program of Research Excellence) as a launchpad for developing the
first set of linked databases in the warehouse. The effort has involved
determining what clinical information needs to be included and developing
a standard descriptive nomenclature. This is important, Edgerton
says, because doctors may describe the same thing differently, for
example, metastatic carcinoma to the lymph node versus
lymph node with metastatic carcinoma present.
Using a controlled vocabulary to construct the database prevents
investigators from having to think of every synonym when they are
performing a database search. The challenge, Edgerton says, is defining
a vocabulary that allows easy searching and that is simultaneously
flexible enough to adequately describe the tissue pathology.
Standards for clinical descriptors and for the storage of gene expression
microarray data are being developed internationally, Edgerton says.
We work with the experimentalists to maintain expertise in
these standards and comply with them, in the way that we structure
our databases and what terms we use.
In building the clinical database, and making provisions for access
to it, Edgerton and her team also are responsible for maintaining
the security and confidentiality of patient information. The lung
cancer clinical database is nearing completion, Edgerton says, and
it will be used as a template for other organ systems.
In the future, Edgerton would like to add histological images to
the clinical database as one of the tissue characteristics. Search
and retrieval methods based on image features is an active area
of research, Edgerton says. Then, as opposed to simply
being able to search based on demographics, stage, histopathological
name, and so on, we might actually also be able to search based
on image characteristics. That would be really exciting.
Edgertons ultimate vision for the linked databases is to
use them for data mining expeditions. She is experimenting with
various algorithms to analyze the data in a fashion that combines
something we know clinically with the molecular profile to get at
cause-and-effect. Hidden in the hundreds and hundreds of datasets
that populate the databases, Edgerton believes, are answers about
the molecular mechanisms underlying disease. Its up to bioinformatics
to find them.