Vanderbilt medical center

peer review newsletter homeabout peer reviewback issues contact us
David Friedman directs the proteomics laboratory, part of the Mass Spectrometry Research Center. Mass spectrometry and bioinformatics are used to identify proteins separated by 2D-gel electrophoresis and other methods.

Bioinformatics yields protein answers

by Leigh MacMillan

Bioinformatics is the key final step in assuring that the proteomics shared resource can do what it does – identify proteins.

Those proteins might be the ones that change in cells treated with a new chemotherapy drug, or they might be the ones associated with a large complex. Whatever the proteins, the proteomics laboratory draws on several different methods to separate them, and then uses mass spectrometry and bioinformatics to identify them.

“We generate the mass spectrometry data and then rely on the bioinformatics field to get our answers,” says David Friedman, director of the proteomics laboratory, which was established as a component of the Mass Spectrometry Research Center under the leadership of Richard Caprioli.

A common approach for identifying proteins uses 2D-gel electrophoresis to separate mixtures of proteins based on physical attributes – isoelectric point and molecular weight. Individual proteins – spots on the 2D-gel – can be cut out of the gel, digested into peptides, and analyzed by mass spectrometry.

This technology is most often directed to finding the proteins that are changing, for example under different experimental conditions, or in disease tissue versus normal tissue. For higher throughput, the core takes advantage of fluorescent dye labels and laser imaging. “This is another way we use bioinformatics,” Friedman says. “We can directly compare two or three samples, labeled with different dyes and separated at the same time. The computer algorithm will tell us who’s changing, who’s not changing, and by how much. It’s very powerful.”

The core’s automated system allows users to select spots for automatic sampling, digestion, and mass spectrometry analysis. Each protein has a “characteristic signature of tryptic peptides,” Friedman says. Bioinformatic search algorithms compare an experimental “signature” to a theoretical digest of every protein in a selected database and return a match, if one exists in the database.

“Our approach is completely dependent on the protein being in the database,” Friedman says. “We rely on the databases being properly annotated, maintained, and continuously updated.” The core makes use of databases containing complete annotated proteins as well as those for expressed sequence tags (ESTs).

The search algorithms for matching experimental mass spectra are either commercially available or free, Friedman says. Like the databases, these algorithms are regularly updated and improved.

The algorithms have to be especially powerful to conduct searches on data from complex mixtures of proteins. Andrew Link, assistant professsor of Microbiology & Immunology, and collaborators developed a technology and analysis algorithm called SEQUEST to directly analyze and identify all of the proteins present in a purified protein complex. To speed the analysis, Link built a 20-node parallel processor. The parallel processor, Friedman says, makes the experiment possible – reducing the database search to hours as opposed to many days.

Improvements to processing speed will likely be the limit of bioinformatics development efforts for the proteomics shared resource, Friedman says.

“We’re advancing the field of proteomics by improving the technologies for protein separation and detection and by developing new technologies,” Friedman says. “We rely on the expertise of bioinformaticians to keep database searching state-of-the-art.”