Database Description


Berkeley Cancer Morphometric Data

Berkeley Morphometric Visualization and Quantification from H&E sections, sponsored by the Lawrence Berkeley National Laboratory, allows the TCGA community to download computed histology-based information, and visualize images and overlaid computed information.


Cancer Digital Slide Archive

The Cancer Digital Slide Archive (CDSA) is a browser-based, interactive tool for viewing and annotating (in beta) TCGA diagnostic and tissue slide images. Pathology reports, clinical metadata, as well as genomics information can also be retrieved. The CSDA is being developed and maintained by the Department of Biomedical Informatics and the Winship Cancer Institute, Emory University


The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets.



The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes.


The Cancer Genome Workbench

CGWB (The Cancer Genome Workbench) hosts mutation, copy number, expression, and methylation data from a number of projects, including TCGA and TARGET.

The Cancer Imaging Archive

TCIA is a large archive of medical images of cancer accessible for public download. Registering is free. The images are organized as “Collections”, typically patients related by a common disease (e.g. lung cancer), image modality (MRI, CT, etc) or research focus.

User's Guide


Chemical Information

Database Description



Search for information about chemical compounds, substances, and BioAssays

PubChem Help

PubChem Bioassay

The PubChem BioAssay Database contains bioactivity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure.

PubChem Help
PubChem Compound

The PubChem Compound Database contains validated chemical depiction information provided to describe substances in PubChem Substance. Structures stored within PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups.

PubChem Help

PubChem Substance

The PubChem Substance Database contains descriptions of samples, from a variety of sources, and links to biological screening results that are available in PubChem BioAssay. If the chemical contents of a sample are known, the description includes links to PubChem Compound.

PubChem Help


Database Description

Help is a registry and results database of publicly and privately supported clinical studies of human participants conducted around the world.

Help for Researchers


ClinVar aggregates information about genomic variation and its relationship to human health.

What is ClinVar?
Genetic Testing Registry

Find all types of GTR records, including tests, conditions/phenotypes, genes, and labs.

GTR Help


Organizes information related to human medical genetics, such as attributes of conditions with a genetic contribution.

MedGen Help


OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. OMIM is authored and edited at the McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, under the direction of Dr. Ada Hamosh. Its official home is


Gene function, expression and regulation

Database Description



AmiGO is a search engine and database for the Gene Ontology (GO) project,  a collaborative effort to address the need for consistent descriptions of gene products across databases.

AmiGO Manual
Array Express

ArrayExpress is an archive of functional genomics data from high-throughput experiments that provides these data for reuse to the research community.

ArrayExpress Help

BioCyc The BioCyc collection of Pathway/Genome Databases (PGDBs) provides a reference on the genomes and metabolic pathways of thousands of sequenced organisms. User's Guide

ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing a diverse range of RNA sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and immunoprecipitation (IP) of proteins that interact with DNA and RNA, i.e., modified histones,

Getting started with ENCODE


Explore, view, and download genome-wide maps of DNA and histone modifications from our diverse collection of epigenomic data sets


GEO datasets

This database stores curated gene expression DataSets, as well as original Series and Platform records in the Gene Expression Omnibus (GEO) repository. Enter search terms to locate experiments of interest. DataSet records contain additional resources including cluster tools and differential expression queries.

GEO Documentation

GEO Profiles

This database stores individual gene expression profiles from curated DataSets in the Gene Expression Omnibus (GEO) repository. Search for specific profiles of interest based on gene annotation or pre-computed profile characteristics.

GEO Documentation


The PhenoGen Informatics website is a comprehensive toolbox for storing, analyzing, and integrating microarray data and related genotype and phenotype data. This tool provides a way to visualize data from genomic sequencing, RNA-Seq, and microarray data for rats and mice.​

PhenoGen Help

TCGA The Cancer Genome Atlas (TCGA)  


UniGene computationally identifies transcripts from the same locus, analyzes expression by tissue, age, and health status and reports related proteins (protEST) and clone resources.



Database Description


NCBI Human Genome Resources

A challenge facing researchers today is that of piecing together and analyzing the plethora of data currently being generated through the Human Genome Project and scores of smaller projects. NCBI's Web site serves an an integrated, one-stop, genomic information infrastructure for biomedical researchers from around the world so that they may use these data in their research efforts.

Using MapViewer


Human Genome Nomenclature Committee (HGNC) is a curated online repository of HGNC-approved gene nomenclature, gene families and associated resources including links to genomic, proteomic and phenotypic information.

Ensembl Genome Browser

The Ensembl project's representation of human genomic data, including genome assembly, gene annotation, comparative genomics, variation and regulation.

Ensembl Help

UCSC Genome Informatics

The UCSC Genome site contains the human reference sequence. It also provides portals to ENCODE data at UCSC (2003 to 2012) and to the Neandertal project.


1000 Genome Project

The 1000 Genomes Browser allows users to explore variant calls, genotype calls and supporting sequence read alignments that have been produced by the 1000 Genomes project.








The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.

Ensembl Workshop Materials


EuPathDB Bioinformatics Resource Center for Biodefense and Emerging/Re-emerging Infectious Diseases is a portal for accessing genomic-scale datasets associated with the eukaryotic pathogens in the following websites: AmoebaDB, CryptoDB, FungiDB, GiardiaDB, MicrosporidiaDB, PiroplasmaDB, PlasmoDB, ToxoDB, TrichDB, TriTrypDB, OrthoMCL.


Integrated Microbial Genomes (IMG) and metagenomes supports the annotation, analysis and distribution of microbial genome and metagenome datasets sequenced at DOE's Joint Genome Institute (JGI).


Microbial Genome Database The Microbial Genome Database (MBGD) facilitates comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison.  
Microbial Genome Resources

Microbial Genomes Resources contains public data from prokaryotic genome sequencing projects. The sequence collection contains data from finished genomes as well as draft assemblies.


Mouse Genome Informatics

MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.


Port Eco

PortEco is a next-generation resource for knowledge and data about the biology of Escherichia coli K-12 group strains (these are laboratory strains and are not pathogenic), its bacteriophages, plasmids, and mobile genetic elements. PortEco is being developed by a national consortium of both laboratory biologists and computational biologists, and is funded by a grant from the U.S. National Institutes of Health.


Rat Genome Database The Rat Genome Database was created to serve as a repository of rat genetic and genomic data, as well as mapping, strain, and physiological information. It also facilitates investigators research efforts by providing tools to search, mine, and analyze this data. Rat Community
Tair The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana . Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, gene expression, Help
UCSC Genome Browser

The UCSC Genome site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to ENCODE data at UCSC (2003 to 2012) and to the Neandertal project.



Immunology Databases

Database Description


Antibodies Online

Antibodies Online is an online marketplace for proteomics that contains more than 1 million research antibodies, ELISA kits and related products from 150 suppliers. They make comparing products easy by standardizing the relevant information and validates product data and experimental details.



A Database of Immunodominant B cell Epitopes

Information and Help

The dbMHC database provides an open, publicly accessible platform for DNA and clinical data related to the human Major Histocompatibility Complex (MHC).

IEDB contains experimental data characterizing antibody and T cell epitopes studied in humans, non-human primates, and other animals and includes epitopes involved in infectious disease, allergy, autoimmunity, and transplant.



IMGT specialized in the sequences, genes and structures of immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility (MH) proteins of vertebrates, IgSF and MhSF superfamily proteins of vertebrates and invertebrates, fusion proteins for immunological applications (FPIA) and composite proteins for clinical applications (CPCA).


Find murine models of immune processes and immunological diseases.


NetMHC 3.4 Server

predicts binding of peptides to a number of different HLA alleles using artificial neural networks (ANNs).


Database Description



The Conserved Domain Database is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.

CDD Help


Protein Data Bank (PDB) is an archive of information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.

Understanding PDB Data

The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.

Entrez Help
Protein Clusters

This collection of related protein sequences (clusters) consists of proteins derived from the annotations of whole genomes, organelles and plasmids. It currently limited to Archaea, Bacteria, Plants, Fungi, Protozoans, and Viruses

Protein Clusters Help


STRING is a database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources: Genomic Context, High-throughput experiments, coexpression, and previous knowledge


The Structure database contains 3D protein structures and allows users to retrieve specific subsets of resolved protein structures, find structural templates for proteins, find structures that are similar in 3D shape and view 3D srtucture. It is also referred to as the Molecular Modeling Database (MMDB).


How To

The Human Protein Atlas

The Human Protein Atlas (HPA) portal is a publicly available database with millions of high-resolution images showing the spatial distribution of proteins in 44 different normal human tissues and 20 different cancer types, as well as 46 different human cell lines.

About HPA


Database Description


Google Scholar Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. Search Tips

PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.

Quick Start

NLM Tutorials

Library Classes

PubMed Reminer

Detailed analysis of PubMed Search results


PubServer collects homologous sequences from NR database and retrieves and filters associated publications.


Web of Science The Web of ScienceSM (formerly Web of Knowledge) is today's premier research platform, helping you quickly find, analyze, and share information in the sciences, social sciences, arts, and humanities. You get integrated access to high quality literature through a unified platform that links a wide variety of content with one seamless search.




Database Description



Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.

Gene Help

This resource organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.


An automated system for constructing putative homology groups from the complete gene sets of a wide range of eukaryotic species.

Query Tips

The Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.

Entrez sequences help

The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.

Entrez sequences help


Database Description



Access page for all NCBI variation databases (dbSNP, dbVAR, dbGAP, ClinVar, GTR) 

Variation Handbook



ClinVar aggregates information about genomic variation and its relationship to human health.

What is ClinVar?

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.


dbGAP Tutorial



Database of single nucleotide polymorphisms (SNPs) and multiple small-scale variations that include insertions/deletions, microsatellites, and non-polymorphic variants.


dbSNP Handbook


Fact Sheet (will download as a PDF)


Database of genomic structural variation including insertions, deletions, duplications, inversions, deletion-insertions, mobile element insertions, translocations, and complex rearrangements

Structural Variation Overview



Fact Sheet (will download as a PDF)

1000 Genomes Browser 



The 1000 Genomes Project is the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. The goal of the 1000 Genomes Project is to find most genetic variants that have frequencies of at least 1% in the populations studied by performing low coverage sequencing on a large number of individuals.

Ensembl Tutorial (will download as a Word document)



TCGA data portal The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. About TCGA