[] PRL []

GeSNP

The use of high-density oligonucleotide arrays to measure thousands of mRNA abundance levels in parallel has become commonplace. In order to take further advantage of the growing body of data and to enable others to do so, we have developed a method and computer program to mine the hybridization patterns in oligonucleotide array-based gene expression data to identify genes with sequence differences. The program enables the broad, unbiased and opportunistic extraction of genetic information from new or pre-existing gene expression data obtained with high-density oligonucleotide arrays.

Use the form below to upload your data. The input data of the program are un-normalized output from the Affymetrix .CEL files (GeneChip Operating Software). Files in binary format must be transformed into text (.CEL) files. This conversion tool developed by Affymetrix can be downloaded from their website (user must have a free Affymetrix account to download) and used to make the conversion to text format (IMPORTANT: choose conversion mode, "Version 4 to 3").

GeSNP currently supports the following Affymetrix arrays: HG-U133 Plus 2, HG-U133A, HG-U133B, HT_HG-U133A, HT_HG-U133B, HG-U95Av2, Mouse 430 2.0, Mouse 430A 2.0, MG-U74Av2, RG-U34A, Drosophila 2.0, ATH1-121501, Sugar_Cane, Rice, YG_S98, Bovine, Porcine, Rhesus and Canine_2. If there are additional Affymetrix Arrays that you would like to analyze on the GeSNP website, please contact Jenn and Matt. Please include the exact array type and the filename contained within the .CEL file header ending in .1sq (e.g., "MG_U74Av2.1sq"). The new library file should be available within a week.

To use the application, group an appropriate and complementary set of Affymetrix .CEL files into two sets, zip-archived to two files.

Example files are provided for Group A and Group B to demonstrate how the files should be zipped and what the contents should look like. CEL-file filenames should not contain spaces and should be alphanumeric (underlines and dashes are okay.)

A comma delimited text file containing results is returned to the user as output. This file contains the following comma delimited columns: Probe Set, Probe Pair #, Probe Set-Probe Pair #, # of files Passing Group A, Mean Group A, Variance Group A, # of files Passing Group B, Mean Group B, Variance Group B, and T-value. T-values above user defined thresholds indicate the possibility of sequence variation between Group A and Group B.

Greenhall, J.A., Zapala, M.A., Caceres, M., Libiger, O., Barlow, C., Schork, N.J., and Lockhart D.J. 2007. Detecting genetic variation in microarray expression data. Genome Res. 17: 1228-35.

Source code for GeSNP is available here, and can be compiled using GCC by typing g++ -o gesnp gesnp.cpp.

Group A Zipped Data Files to be Uploaded

Send Result As...

[Note: Use e-mail option for large comparisons (greater than 100MB).]

Group B Zipped Data Files to be Uploaded

Run the Program

contact webmaster: Charles Abney

counter
official ucsd webpage

Last UpDate: 07/09/07 03:00:03 PM  Valid XHTML 1.0 Transitional