Four tips for RT-qPCR data normalization using reference genes

Barbara D'haene - Oct 23, 2013

A measured difference in RNA expression level between two samples is the result of both true biological as well as experimentally induced (technical) variation. Different variables, inherent to the RT-qPCR workflow need to be controlled for in order to minimize the technical variation. Influencing parameters include the amount and quality of starting material, enzymatic efficiencies, and overall transcriptional activity.

It is highly recommended to minimize the technical variation by using standard operating procedures throughout the entire qPCR workflow. The remaining technical variation should then be further reduced or removed by using a proper normalization approach, enabling a better appreciation of the true biological variation.

The use of multiple stable reference  or housekeeping genes is generally accepted as the method of choice for RT-qPCR data normalization ( Vandesompele et al, Genome Biology, 2002 - Bustin et al., Clinical Chemistry, 2009 ). qbase+ greatly facilitates the process of validating reference / housekeeping genes and performing state-of-the-art normalization using the geometric mean of multiple validated reference / housekeeping genes.

Before walking you through the tips, I just want to mention we recently recorded a free training video on how to select good reference genes using geNorm

According to the MIQE guidelines ( Bustin et al., Clinical Chemistry, 2009 ), genes used for normalization should be referred to as reference genes, not as housekeeping genes. Therefore I will refer to reference genes for the remaining part of this article.

1. Normalization using multiple validated reference genes results in much more accurate results

All too often a single reference gene is used for normalization. A normalization strategy based on a single non-validated reference gene leads to erroneous normalization up to 3 and 6.4 fold in 25 and 10% of the cases, respectively, with sporadic cases showing error values above 20.

In general, it is recommended to use between two and five validated stably expressed reference genes for normalization. Unfortunately, non-validated (single) reference genes continue to being used, assuming they are stably expressed.

A geNorm pilot study in which the stability of a panel of (eight) candidate reference genes is evaluated in a representative set of (ten) samples is the preferred way to determine the best set and required number of reference genes to be used. As long as the experimental conditions do not change, one can use the results of such a pilot study to achieve the optimal normalization for all future studies. Normalization using multiple stably expressed reference genes will provide statistically more significant results and will enable detection of small expression differences.

Since 2010, an improved version of geNorm is integrated in qbase+.


2. Normalization with multiple reference genes enables quality control on the stability of their expression

Despite the validation of reference genes in a geNorm pilot study (typically preceding the larger final study), it remains crucially important to confirm the stability of the selected reference genes in the actual experiment containing all samples being studied. Stability of reference genes can be determined by calculating their geNorm M value (M) or their coefficient of variation on the normalized relative quantities (CV). These values can then be compared against empirically determined thresholds for acceptable stability ( Hellemans et al., Genome Biology, 2007 ).

For normalization of genomic DNA (e.g. for gene copy number analysis) very low numbers should be obtained (M < 0.2, CV < 10%). Acceptable values in gene expression studies depend on the heterogeneity of the sample set. For homogeneous samples such as from cell cultures of the same cell type, M should be lower than 0.5 and the CV should be below 25%. In more heterogeneous sample sets (e.g. when comparing different cell types, clinical biopsies, cancer tissue in general, etc.) reference genes are typically more variable; here, we advise to aim for M < 1 and CV < 50%. Importantly, the reference gene expression stability evaluation procedure can only be performed if multiple reference genes have been used for normalization as is it based on pairwise variation analyses. If you cannot meet the proposed M value requirements, we recommend to evaluate more or alternative candidate reference genes, or assess your workflow in order to remove experimentally induced variation wherever possible (e.g. standardize the quality of your RNA samples, follow the sample maximization procedure, see next tip), etc.).

3. There is no need for the reference gene(s) to be measured in the same plate (run) as the gene of interest

In a relative quantification study, the experimenter is usually interested in comparing the expression level of a particular gene among different samples. Reliable estimates for relative expression levels can only be obtained by preventing or reducing technical variations between samples and measurements. It is well recognized that variations in the target nucleic acids input amount between samples need to be corrected through normalization, typically using one or multiple reference genes. Run-to-run variation within a series of measurements for a given gene is a second type of technical variation that needs to be avoided, minimized or corrected.

There is no need for the reference gene(s) to be measured in the same plate (run) as the gene of interest. - is no need for the reference gene(s) to be measured in the same plate (run) as the gene of interest

I advise you to follow the sample maximization method (figure 1) whenever possible, stating that all samples (or as many as possible) should be analyzed for a given gene in the same run. This sample maximization strategy does not suffer from induced technical (run-to-run) variation between samples (as all samples are measured in the same run for a particular gene), and therefore does not require inter-run calibration to be performed. Since reference genes, independently of the other assays, quantify the relative concentration of nucleic acids between samples there is no need to repeat the measurement of the reference genes in every run. In conclusion, there is no need for the reference gene(s) to be measured in the same plate as the gene of interest.

In other words, it is perfectly okay to normalize a gene of interest measured by a hydrolysis probe on qPCR instrument X with a reference gene measured by SYBR Green I on qPCR instrument Y. If you, like most of RT-qPCR users, are interested in relative quantification, then you only need to make sure that all samples are treated equally; it is okay however to measure genes differently.

4. A good reference gene does not have to be expressed at the same level as the gene of interest

This general misconception probably stems from the era when researchers where performing Northern blots and wanted to detect the expression level of both the gene of interest and reference gene in one exposure. However, with RT-qPCR having the capacity to measure gene expression levels with a very wide linear dynamic range of quantification, there is no need that the reference genes have similar expression levels as the genes of interest. There is only one requirement to adhere for a reference gene, i.e. that it is stably expressed.

Sample maximization vs gene maximizationFigure 1: Sample maximization (left) versus gene maximization (right) run setup strategies for an experiment with

  • 11 samples (S1-S11)
  • 1 negative control (W)
  • 6 genes (3 genes of interest (GOI)
  • 3 reference genes (REF)), all measured in duplicate
In the gene maximization strategy, it is recommended that a few samples are repeated in both runs (so-called inter-run calibrator samples) in order to detect and remove inter-run variation ( Hellemans et al., Genome Biology, 2007 ). In general, the sample maximization strategy is to be preferred (absence of sample related inter-run variation, easier to setup, fewer reactions).

Free training video: geNorm

In 30 minutes you'll learn how to select good reference genes using geNorm: analysis and interpretation of the geNorm output.

Learn now

Topics: normalization- reference genes

Previous Post

How to improve qPCR assay design by understanding the impact of primer mismatches?

Next Post

Seven tips for bio-statistical analysis of gene expression data

Stay up to date

Subscribe and we'll send new blog posts directly to your inbox!

Subscribe to Email Updates

Newest posts