July 14, 2013

Rosenhan experiment

Rosenhan experiment

From Wikipedia, the free encyclopedia

The Rosenhan experiment was a famous experiment done in order to determine the validity of psychiatric diagnosis, conducted by psychologist David Rosenhan and published by the journal Science in 1973 under the title "On being sane in insane places".[1] The study is considered an important and influential criticism of psychiatric diagnosis.[2]

July 7, 2013

Investing Strategy

Identify index mutal funds that you would like to own.
Build cash for investment.
Invest 10% of free cash into attractive investments when they undergo a 5% correction
Invest another 10% of free cash into attractive investments when they undergo a 10% correction

randomness


Microarray Home
Introduction
Services and Projects
GeneChip Expression
GeneChip Genotyping
Custom Microarrays
Data Analysis
People


Microarray Home

The Microarray Resource provides microarray analysis service and technical expertise to all researchers at Boston University and to interested groups from outside the university.  We are located on the 6th floor of the Evans (E) building at the BU Medical Campus. 

                                             

                   Affymetrix GeneChip                                 Custom Oligonucleotide Microarray

We offer two different microarray platforms at the Microarray Resource, Affymetrix GeneChips and custom Oligonucleotide microarrays.  For more information go to the Services [link] section of the webpage.  For both platforms users of the facility provide us with high quality samples and the Microarray Resource does the rest. 

Introduction

Services
      GeneChip Expression Analysis                       [Prices]
      GeneChip Genotyping Analysis                     [Prices]
      Custom Oligonucleotide Microarrays             [Prices]

Data Analysis

Microarray Resource People

Other information
Starting Material for Microarray Analysis
Protocols
Grant Materials
Links                     

All interested researchers are encouraged to contact the Microarray Resource to discuss a potential project or to ask any question about the use of microarrays.

Dr. Norman Gerry 617-414-1219                     npgerry@bu.edu
Dr. Marc Lenburg 617-414-1375                     mlenburg@bu.edu
Microarray Resource 617-414-1377

We look forward to talking with you!



Introduction

Biological arrays are an ordered set of compounds that are affixed to a solid surface.  By applying a sample solution to the array it is possible to assay the interaction of the sample with each of the compounds on the array.  Microarrays are a powerful research tool because they enable massively-parallel assays of biological samples.


The most common application of microarrays is gene expression analysis.  In this case the interaction between labeled mRNA in a biological sample and complementary DNA probes, which are affixed to the array, allows rapid quantification of the entire transcriptome.  Many other types of arrays are possible.  Nucleic acid arrays can also be used for genotyping and antibody arrays can be used for proteomics.

We offer two different microarray platforms at the Microarray Resource.  The Affymetrix GeneChip system is a commercial platform that enables rapid, reproducible, and accurate microarray analysis on a genome-wide scale.  The Affymetrix GeneChip platform can be used for both gene expression and genotyping analysis.

We also offer made-to-order microarrays that are manufactured by synthesizing oligonucleotide probes and spotting them onto glass microarray slides here at the Microarray Resource.  These “custom” microarrays are able to detect hundreds to thousands of genes of specific interest to individual investigators.  The custom microarrays are an extremely flexible platform.  They can be used for gene expression analysis of almost any organism in addition to many other types of projects.  Additionally, by making our own microarrays, we are able to significantly reduce the cost of performing microarray experiments.

At the Boston University Microarray Resource we strive to allow you to easily incorporate microarrays into your research and help you get the most out of your data.  You provide us with high quality samples and the Microarray Resource does the rest.



Services Offered by the Boston University Microarray Resource



Affymetrix GeneChips

The Affymetrix GeneChip™ system is a commercial microarray platform that allows whole genome gene expression analysis for common experimental organisms and high-throughput genotyping for human samples.

Custom Oligonucleotide Microarrays
The Microarray Resource will make custom arrays in-house by spotting oligonucleotide probes onto glass slides.  This is an extremely flexible platform allowing focused microarray analysis for any organism.



What types of projects can I use Microarrays for?

Affymetrix GeneChip Expression [link]
-          Genome wide expression analysis for common experimental organisms
o   Expression analysis to investigate cellular biology
o   Expression profiling to categorize biological samples

Custom Oligonucleotide Microarray [link]
-          Expression analysis for all organisms
o   Expression analysis to investigate cellular biology
o   Expression profiling to categorize biological samples
-          Chip on Chip
-          Gene Copy number
-          Spotting user provided cDNA, protein, anti-body, or small-molecule libraries

Affymetrix GeneChip Genotyping [link]
-          High Throughput Human Genotyping


Affymetrix GeneChip for Expression Analysis

The Affymetrix GeneChip system is a commercial microarray platform that allows whole genome gene expression analysis for common experimental organisms.  This system has three major advantages over other array systems.  It is easy to get rapid results, it has the capability to monitor the expression of every gene in the genome, and it is the most widely used commercial microarray platform.

However, the Affymetrix system also has a few disadvantages when compared with the Microarray Resource’s custom array system.  The GeneChip platform is significantly more expensive than custom microarrays, and Affymetrix only makes GeneChip arrays for common experimental organisms.

Getting started with Affymetrix GeneChips is easy
-           Set-up an appointment with members of the Microarray Resource to discuss your experiment.  This is optional, but highly recommended.
-           Give us 10 µg total RNA [more information about starting RNA link]
-           Wait 1 week for us to process your samples
-           Work with microarray core to analyze data

Available GeneChips for expression profiling and BUSM prices [Link] ]   Contact Us [Link]



Affymetrix GeneChip for Genotyping Analysis

Genotyping with Affymetrix GeneChips
The Affymetrix GeneChip™ system is a commercial microarray platform that allows high-throughput genotyping for human samples.  The GeneChip® Mapping 10K Array offers the ability to generate over 10,000 SNP genotypes from a single genomic DNA sample.  The 100K array, which will be released next year, will probe over 100,000 SNPs.



Getting started with Affymetrix GeneChips is easy
-           Set-up an appointment with members of the Microarray Resource to discuss your experiment.  This is optional, but highly recommended.
-           Give us 250 ng genomic DNA [more information about starting RNA link]
-           Wait 1 week for us to process your samples
-           Work with microarray core to analyze data

Please contact the microarray resource for more information about genotyping using the Affymetrix GeneChip platform

GeneChips for SNP genotyping and BUSM prices [Link]                Contact Us [Link]


Custom Oligonucleotide Microarrays

Our primary goal is to make microarray analysis more accessible to all researchers at BU.  We hope that microarray analysis of gene expression will become a method that researchers consider part of their regular repertoire of experimental approaches the way a Northern blot is now.  One of the most important ways that we can do this is by making the technology inexpensive.  Manufacturing custom Oligonucleotide microarrays in house will enabling an enormous cost savings.  The custom microarray system will allow researchers to choose a collection of genes of interest for their own research specific microarray.  Another advantage of custom microarrays is that while Affymetrix arrays are targeted for expression analysis and genotyping any sequence can be spotted on a custom array.  This opens up a wide range of additional application such as analysis of sequences enriched in chromatin immunoprecipitations, detecting region specific differences in copy number, pathogen detection, and many more.  Even in the area of gene expression analysis, custom microarrays have the advantage that they can be designed for any organism.


There are a number of ways to select genes for a custom microarray.  Known genes of interest can be the primary source but this approach can be supplemented with literature and database-mining or with preliminary experiments using Affymetrix whole-genome microarrays.

The Microarray Resource makes custom arrays in-house by spotting and covalently cross-linking oligonucleotide probes onto glass slides.  Probes for genes of interest are designed using sophisticated software that determines the best 50-70 nucleotide probe sequence for each gene.  These probes are then synthesized on an ABI 3900 high throughput DNA synthesizer.  The probes will be spotted onto derivatized glass slides using a Genetix QArray-Mini™ custom array spotter.  Once the arrays are made RNA samples from investigators are labeled and hybridized to these arrays and scanned with a Packard ScanArray Express™ multi-channel microarray scanner.  We will also spot user-provided libraries.

In order to make sure that you get the most out of your microarray experiment, the Microarray Resource will help you analyze your data.  This includes guidance on experimental design and statistical analysis, as well as access to software that will allow sophisticated data mining and visualization.

Getting started with Custom Arrays is easy
-           Set-up an appointment with members of the Microarray Resource to discuss your experiment.  This is optional, but highly recommended.
-           Determine a list of genes for your custom microarray
-           Wait a few weeks while we design and manufacture your microarrays.
-           Give us 5 µg total RNA [more information about starting RNA link]
-           Wait 1 week for us to process your samples
-           Work with microarray core to analyze data


Custom Oligonucleotide Microarray Prices [Link]                             Contact Us [Link]
Affymetrix GeneChip Prices for Expression Analysis

Affymetrix currently makes GeneChip expression arrays for 9 organisms.  The cost of the arrays varies from $300-$350 and is detailed below.  
Available Organisms
Human
Mouse
Rat

Yeast (cerevisiae)
Drosophila
P. aeruginosa
Arabidopsis
E. coli
C. elegans
B. subtilis
Barley

For human, mouse, and rat GeneChips, the entire transcriptome is split between two Affymetrix GeneChips.  In each case, the A chips contain the best annotated genes from the organism, while B chips contain mostly ESTs, splice variants, and poorly annotated transcripts.  The cost of running both A and B chips for human, mouse and rat samples is less than double the cost of running just the A chip because the processed RNA from a single sample can be hybridized to multiple arrays.

Organism
Genes
Annotated Genes
Chip
Reagents
Labor
Total
Human U133A
~22,500
19,993 w/ Gene Symbol
$350
$200
$300
$850
Human U133B
~22,500
10,043 w/ Gene Symbol
$350
$25
$150
$525
Mouse MOE430A
~22,500
?
$350
$200
$300
$850
Mouse MOE430B
~22,500
?
$350
$25
$150
$525
Rat ROE430A
~16,000
?
$350
$200
$300
$850
Rat ROE430AB
~16,000
?
$350
$25
$150
$525
Arabidopsis
~16,000
?
$300
$200
$300
$800
C. elegans
~22,500
?
$300
$200
$300
$800
Drosophila
~13,500
?
$300
$200
$300
$800
Yeast SG-98
~7,000
4,181 w/ Gene Symbol
$300
$200
$300
$800
E. coli
~5,500
?
$300
$200
$300
$800
P. aeruginosa
~6,000
?
$300
$200
$300
$800

Chip
Affymetrix GeneChips are available to us at the Boston Academic Consortium prices.  Due to the nature of our agreements with Affymetrix, we can only offer chips at these prices to academic investigators.  Other groups are encouraged to purchase their GeneChips from Affymetrix, and bring them to the microarray resource for hybridization.

Reagents and Labor
For each sample to be prepared for GeneChip analysis there is a cost of $500 for reagents and labor.  One sample is hybridized to each GeneChip microarray.  If the sample is to be hybridized to a set of A and B chips then the reagent and labor cost is $675.  Our charge for labor is very competitive with other facilities.  Remember, we provide comprehensive assistance with data analysis, a service unique to the Boston University Microarray Resource.

Experimental Design and Data Analysis
In order to make sure that you get the most out of your microarray experiment, the Microarray Resource will help you analyze your data.  This includes guidance on experimental design and statistical analysis, as well as access to software that will allow sophisticated data mining and visualization.


Affymetrix GeneChip Prices for Genotyping

Affymetrix currently makes one GeneChip for SNP genotyping.

Human Mapping 10K Array

Affymetrix anticipates releasing a similar 100K Mapping Array in 2004.


SNPs
Chip
Reagents & Labor
Total
Human 10K
>10,000
$400
?
?

Chip
Affymetrix GeneChips are available to us at the Boston Academic Consortium prices.  Due to the nature of our agreements with Affymetrix, we can only offer chips at these prices to academic investigators.  Other groups are encouraged to purchase their GeneChips from Affymetrix, and bring them to the microarray resource for hybridization.

Reagents and Labor
More information to come

Experimental Design and Data Analysis
In order to make sure that you get the most out of your microarray experiment, the Microarray Resource will help you analyze your data.  This includes guidance on experimental design and statistical analysis, as well as access to software that will allow sophisticated data mining and visualization.
Custom Microarray Prices

The cost of design and fabrication of custom microarrays is $150.  This cost includes probe selection, synthesis, and spotting. 

A minimum order is required on custom microarray projects, though you don't need to use -- or even necessarily print -- all of the arrays at one time.  The minimum order for a new custom array varies depending on the number of oligos in the array and the kind of array you want to make.
Minimum Order
Number of Oligos
Human, Mouse, Rat, and Yeast Expression Arrays
All Other Arrays
1-100
10 arrays
10 arrays
101-500
20 arrays
30 arrays
501-1000
40 arrays
60 arrays

Custom microarrays containing more than a thousand unique oligos are certainly possible. Investigators seeking to make arrays with more than a thousand unique oligos should contact us to discuss their project.  Discounts would be considered for projects larger than 100 arrays.  Investigators interested in projects of this size should contact us to discuss their project


Samples
Genes
Chip
Reagents
Labor
Total
Custom Microarray
2 per array
1-1,000 or more
$150
$100
$200
$450

Reagents and Labor
For each custom microarray there is a cost of $300 for reagents and labor.  Two samples are hybridized to each custom microarray.

Experimental Design and Data Analysis
In order to make sure that you get the most out of your microarray experiment, the Microarray Resource will help you analyze your data.  This includes guidance on experimental design and statistical analysis, as well as access to software that will allow sophisticated data mining and visualization.

Compared with the Affymetrix system, custom-array experiments can be done at greatly lower cost. The custom array itself costs only $150 per array, which is less than half of the cost of a GeneChip array.  Additionally, the cost of reagents and labor for hybridizing a custom array is $300 versus $500 for GeneChip arrays.  Another important cost savings with custom arrays is that two samples, such as control and experimental, are hybridized to a single array while just a one sample can be hybridized to a GeneChip array.  Consequently, the simplest custom array experiment costs $450 vs. $1600 for the simplest Affymetrix experiment.


Starting RNA for Affymetrix GeneChip Expression Analysis

-     10 µg high quality total RNA preferred.
-     Small amplification protocols are available that facilitate GeneChip expression analysis from samples as small as 100 ng.
-     In less than 10 µl of water (we will dry down dilute samples)
-     DNAse treatment is not necessary.  Small amounts of genomic DNA contamination will not affect the results of microarray analysis.
-     Poly-A selected RNA can be used for microarray analysis, but this is not recommended unless previous studies were conducted using poly-A selected RNA.
-     RNA extraction protocols [link]
-     The most common problem with RNA that we encounter in the microarray core facility is carryover organic contamination from the extraction.  This organic contamination will cause sample preparation reactions to fail. 


Starting DNA for Affymetrix GeneChip SNP Genotyping
-     250 ng high quality genomic DNA
-     DNA extraction protocols [link]

Starting RNA for Custom Microarray Analysis
-     5 µg high quality total RNA preferred.
-     In water
-     DNAse treatment is not necessary.  Small amounts of genomic DNA contamination will not affect the results of microarray analysis.
-     Poly-A selected RNA can be used for microarray analysis, but this is not recommended unless previous studies were conducted using poly-A selected RNA.
-     RNA extraction protocols [link]
-     The most common problem with RNA that we encounter in the microarray core facility is carryover organic contamination from the extraction.  This organic contamination will cause sample preparation reactions to fail. 


Microarray Protocols

GeneChip Expression Analysis Protocol [Link]
            Sample Preparation
            Hybridization, Staining, and Scanning
           
GeneChip Genotyping Protocol [Link]
            Sample Preparation
            Hybridization, Staining, and Scanning

Custom Oligonucleotide Microarray Protocol [Link]
            Array Production Protocols
            Sample Preparation and Hybridization Protocols

RNA extraction protocols

DNA extraction protocols

Data Analysis Methods



Data Analysis


Data Pre-processing

Introduction

Following Microarray hybridization and scanning there are a few things that need to be done to create a data set that is ready for analysis.  These include image quantification, normalization, and annotation.

Image Quantification

For Affymetrix GeneChips image quantification is performed using GeneChip Operating System 1.0 software (CGOS 1.0).  Starting with a scanned image GCOS determines the intensity of each 25mer probe on the GeneChip.  Then a gene specific intensity is calculated using the intensities of the set of probes for each gene.  This procedure is described in more detail in the Affymetrix Statistical Algorithms Reference Guide [https://www.affymetrix.com/support/technical/technotes/statistical_reference_guide.pdf]


Chip to Chip Normalization

Following within-chip image quantification, it is necessary to normalize the data across chips in order to make measurements as comparable as possible across chips.  There are a number of different normalization methods, but in general more complex methods will do a better job of normalization at the risk of overfitting.  Furthermore, as more samples are added to a microarray data-set, chip to chip differences become less important.  This makes complex normalization less important.  In general the Microarray Resource uses the simplest normalization method, linear scaling.

Linear Scaling

In linear scaling, the intensity of each gene on a chip is multiplied by a constant such that the average intensity of all the genes on that chip is scaled to a predetermined target.

Quantile Normalization

In quantile normalization, the intensity of each gene is ranked within each chip.  The average intensity across all chips of each rank is then calculated.  Finally, on each chip, the intensity of each gene is replaced by the average intensity of the gene of that rank across all chips.

Loess Normalization

In loess normalization, the intensity of the genes on a chip are normalized based on the local mean of signal intensities.
Gene to Gene Normalization
In addition to these normalization methods, which make chips comparable, there are other normalization techniques that make genes comparable on the same scale.  These methods are generally used prior to clustering or principle components analysis.

Log Ratio Normalization

In Log Ratio Normalization, the expression of each gene on each chip is calculated as;
log ( intensity of gene on this chip / mean intensity of gene across all chips)

Z-Score Normalization



Annotation

Introduction

In order to effectively analyze microarray data, it is critical for investigators to have access to complete and up-to-date annotation of the genes on the array.  At the Microarray Resource we get our annotation information from two primary sources, though there are a few others that are worth mentioning.

NetAffx

Affymetrix maintains the NetAffx [Link] database containing information about the genes that are contained on their GeneChip microarrays.  This is the best first source of information about Affmyetrix probe sets because each probe set has a unique page in the NetAffx database containing a broad range of information including gene and probe sequences, links to other databases, and functional descriptions of the genes.

Incyte Proteome Database

The Incyte Proteome BioKnowledge Library [Link] is now available for access by all current Boston University and Boston University Medical Center faculty, staff, and students.  This is an excellent database for finding information about genes from microarray experiments.  It is well curated and provides Pubmed links for all references.  This database is indexed by gene symbol.

Other Databases (NCBI etc.)

The are a number of other database that can provide valuable information about genes from microarray experiments
-                      Genbank
-                      SGI  (yeast)
-                      Gene Ontology


Identifying Differentially Expressed Genes

Introduction

With microarray data, biology researchers want to identify genes differentially expressed under different growth conditions or different treatments, to cluster genes according to their expression pattern, and to differentiate samples in pharmaceutical or clinical studies.

Fold Change

The most straightforward method of identifying differentially regulated genes in a microarray experiment is by fold change.  Fold change is the multiple by which the expression of a gene changed between two experimental groups.

Fold change can be reported using various scales that each convey the same information
Ratio: ¼, 4
Linear: -4, 4
Log base 2: -2, 2
Log base 10: -?, ?
Fold Change is usually calculated using the mean of a set of measurements within an experimental group, but I can also be calculated using the geometric mean, particularly if the original measurements were not converted to logarithmic scale.

While Fold Change is an important descriptor of the behavior of a genes expression between two experimental groups, it does not tell the whole story.  For example take the expression of one gene measured 4 times in each of two experimental groups.

Group A:         100, 200, 200, 300                  Mean = 200
Group B:         100, 100, 200, 2800                Mean = 800

Fold Change = 4

According to Fold Change this is a differentially regulated gene while we can see that Group B is not reproducibly upregulated 4 fold.  Consequently, Fold Change should not be used as a first pass method for identifying differentially expressed genes.


Statistical Significance

A better method for identifying differentially regulated genes is provided by statistics.  Analysis of Variance (ANOVA) is a technique that assesses whether a set of measurements from two or more experimental groups indicates, given observed variance, that the groups are different.  For microarrays the measurements are the expression levels of one gene and the groups correspond to the experimental sample groups.  ANOVA is used to identify genes that are differentially expressed in a manner that is reproducible across multiple measurements within each experimental group.

An ANOVA score is calculated by comparing the variance observed between the sample group means to the variance observed within the groups.  If the between group variance is high relative to the within group variance this indicates differential expression.  The result of an ANOVA is a probability, p, that an observed difference between groups could have been produced by chance if the groups were in fact the same.

Following the use of ANOVA to calculate a p-value for each gene it is useful to choose a p-value cut-off, below which genes will be considered differentially expressed, and above which genes will not be considered differentially expressed.  This cutoff will be arbitrary, but its’ choice should be made with an understanding of the trade-offs between sensitivity and selectivity that are inherent to choosing a significance cut-off.  In general, choosing a lower significance cut-off will result in fewer genes being identified as differentially expressed, but a smaller portion of those that are selected will be false-positives.  Choosing a higher significance cut-off will result in more genes being identified as differentially expressed, but a greater portion of those will be false-positives.  At any significance cut-off it is possible to estimate the associated false-positive and false-negative rates.  This allows an informed choice of the significance cut-off

ANOVA can take a few different forms depending on the experimental design.  The most basic type of ANOVA is a one-way ANOVA.  In a one-way ANOVA, the sample groups are stratified along a single experimental variable.  The simplest one-way ANOVA, with two sample groups, is equivalent to a T-Test.  The result of an ANOVA comparing more than two groups is the probability that any one of the groups is significantly different from the rest.  At the Microarray Resource we perform one-way as well as multiple-factor ANOVA.  Multiple-factor ANOVA differs from one-way ANOVA in that it generates p-value scores for each of the primary experimental axis as well as scores for each interaction between factors.


Multiple Hypothesis Testing

Correction of significance results for multiple hypothesis testing is an important concern in microarray data analysis.  It is common to use a p-value cut-off of 0.05.  In a microarray experiment in which 20,000 genes are measured, even if no genes are truly differentially expressed, 1,000 genes can be expected to meet the p < 0.05 significance cut-off by chance alone.  Furthermore, in the same 20,000 gene experiment with no changed genes, one unchanged gene would be expected to have a p-value as low as 0.00005.

A statistic test, like ANOVA, applied to microarray data tells you the probability that the observations made about a single gene could have been made if the null hypothesis, that the gene is not significantly changed, were true.  When applied to normally distributed random data, p-values will be evenly distributed between 0 and 1.  Thus, when looking at a single gene, a very low p-value is a significant finding, but as you increase the number of genes observed, the chance of finding a single very low p-value increases.

Take a fictitious microarray data set with 20,000 genes, none of which are differentially expressed between the experimental groups.  We will use a p-value cut-off of 0.05 to identify differentially regulated genes.  If we look at any one gene from our fictitious data set, which we know is not differentially expressed, there is a 1 in 20 chance of it having a p-value less than 0.05.  Our gene-wise false-positive rate, at this level of sensitivity, is 5%.  So, if we to use a microarray to observe the expression of a single gene, we can use p-value cut-off of 0.05 and control false positives at a rate of 5%.

If we use a statistical test and a p-value cut-off of p < 0.05 to identify differentially expressed genes from our fictitious microarray experiment, our gene-wise false positive rate is still 5%.  Five percent of 20,000 genes is 1,000 genes, that were not actually differentially expressed, but would be identified as significant at this level of sensitivity.  Testing as many hypotheses as there are genes on a microarray gives plenty of chances to make a mistake.

There are a few different methods for dealing with multiple hypothesis testing in significance analysis of microarray data.  The Bonferroni correction multiplies the significance observed for each hypothesis by the number of hypotheses being tested.  The Bonferroni correction is usually overly stringent for microarray data analysis.  If we use a Bonferoni corrected p-value cutoff of 0.05 on a real microarray data set, no matter how many genes meet the significance cut-off, there will be a 5% chance that a single false-positive will be among them.  If we identify 100 genes that are differentially expressed in an experiment, we would likely be willing to accept a few false-positives among the 100.  The Bonferonni criteria that there is only a 5% chance that a single false-positive is among the 100 is more control of false-positives than is usually necessary.  Increasing selectivity using the Bonferonni correction reduces sensitivity, so fewer differentially regulated genes will be identified.

Another method for treating the multiple hypothesis problem makes more sense for microarray experiments.  The False Discovery Rate (FDR) correction of Benjamini and Hochberg estimates the gene-wise false-positive rate among the genes at a significance cut-off.  The FDR is the quotient of the number of unchanged genes expected at a given significance cutoff over the number of genes detected at that significance cutoff.

The assumption that unchanged genes would have p-values evenly distributed between 0 and 1 can be used to estimate the number of false-positives expected at a given significance cut-off.  The number of false-positives expected at a given significance cut-off will be equal to the number of unchanged genes (or the number of genes on the microarray) times the p-value of the significance cut-off

Estimating the number of changed and unchanged genes

Based on two assumptions, it is possible to estimate the number of changed and unchanged genes in a microarray data set.  The first assumption is that unchanged genes will have p-values evenly distributed between 0 and 1.  The second assumption is that changed genes will not have p-values greater than a certain p-value threshold.

If there are no changed genes with p greater than the threshold then all of the genes with p greater than the threshold are unchanged.  If the unchanged genes have evenly distributed p-values, then the density of unchanged genes above the threshold will be the same as the density of unchanged genes below the threshold.  So, we calculate the density of unchanged genes above the threshold, and integrate this constant density from p equals 0 to 1.


Advanced Analysis Techniques

Principle Components Analysis

Technique

Principle Components Analysis is a mathematical transformation that can be applied to microarray data sets allowing data compression and dimensionality reduction.  The primary objective is to transform the data into a new space where data analysis is easier.  Princpal components analysis transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components.  The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.
The mathematical technique used in PCA requires solving for the eigenvalues and eigenvectors of a microarray data-set in matrix form.  The eigenvector associated with the largest eigenvalue has the same direction as the first principal component. The eigenvector associated with the second largest eigenvalue determines the direction of the second principal component, etc..  The maximum number of eigenvectors equals the number of columns (samples) of the microarray data-set.
At the Microarray Resource we use principal components to view distributation of variability within the various samples that make up an experiment.
Figure Here


Looking at samples

Looking at genes


Hierarchical Clustering


Gene Clustering example (data before clustering / data after clustering)

Sample Clustering example (data before clustering / data after clustering)


At the Microarray Resource we use hierarchical clustering to visualize the expression profiles of a group of genes that have been selected using other statistical methods.

We perform hierarchical clustering using Spotfire software.  If you would like us to perform hierarchical clustering on your data-set, just give us a list of genes to cluster and we’ll do the rest.



K-Means Clustering

K-Means clustering is a technique that is used to divide genes into discrete groups



Biological Data Mining and Pathway Analysis

EASE

GenMapp and MappFinder

Visualization

Introduction

Visualizations are often associated with the presentation of microarray data.

Heat Map

  The most common of these visualizations is the heat map. 

Volcano Plot

In a Volcano Plot, the fold change and significance for each gene are displayed as a scatter plot.  Both fold change and significance are generally plotted in log scale.  The spots take a characteristic volcano form because absolute fold change is correlated with significance.

Volcano plots can be used to demonstrate fold change and significance cut-offs.
Picture here
Volcano plots are also an excellent way to visualize the changes that occur in a group of genes.
Picture here

Talk about making volcano plots comparing more than two groups?

Pathway Visualization

GenMapp



Other Crap

Oligo Design & Synthesis

The Microarray Resource will design oligonucleotide probes for detecting expression of specific genes of interest. This is not trivial as one must consider melting temperature, secondary structure, and sequence specificity, in addition to potential splice variants for each gene. We have automated many steps of this process.

The Microarray Resource will synthesize 50-70mers using the ABI 3900 DNA synthesizer at a rate of 100 oligos per day or more.  In contrast to cDNAs, which are commonly used as microarray probes, oligos provide flexibility to analyze the abundance of all mRNAs produced from a given gene.  One lesson from the large genome projects is that complexity may be generated, in part, by the surprisingly large number of mRNA splice variants derived from a single gene.  cDNA would not allow one to easily distinguish among different splice variants.

 Some pre-designed gene sets (e.g. 100 tumor suppressor genes) will soon be listed on the website to provide a starting point for figuring out what genes an investigator might want to analyze with their custom arrays.


Data Analysis
Data Warehousing & Data Analysis Consulting
Effectively managing and analyzing the large volume of data generated in each microarray experiment is a key factor to the successful use of the experimental approach.  One of our principal objectives as a microarray core facility is to provide software that will allow users of the Microarray Resource to get the most out of their data. 

C++
Netaffx
Excel
Spotfire

 

For Microarray Resource customers, ANOVA is implemented within Microsoft Excel.

Assumptions/Limitations of ANOVA

 

With microarray data, biology researchers want to identify genes differentially expressed under different growth conditions or different treatments, to cluster genes according to their expression pattern, and to differentiate samples in pharmaceutical or clinical studies.

GeneChip probe design