Study Design

 

 

 

Genetic Epidemiology of COPD (COPDGene®) Study Design

Elizabeth A. Regan MD, PhD 1, John E. Hokanson PhD 2, James R. Murphy PhD 1, Barry Make MD1, David A. Lynch MD 1, Terri H. Beaty PhD 3, Douglas Curran-Everett PhD 1,2, Edwin K. Silverman MD 4, James D. Crapo MD 1 for the COPDGene® Investigators

 

 

 

Abstract:

Background: COPDGene® is a multicenter observational study designed to identify genetic factors associated with COPD.  It will also characterize chest CT phenotypes in COPD subjects, including assessment of emphysema, gas trapping, and airway wall thickening.  Finally, subtypes of COPD based on these phenotypes will be used in a comprehensive genome wide study to identify COPD susceptibility genes.

Methods/Results: COPDGene will enroll 12,000 subjects including smokers with and without COPD across the GOLD stages along with a non-smoking control group.  Both Non-Hispanic white and African-American subjects are included in the cohort.  Inspiratory and expiratory chest CT scans will be obtained on all participants.  In addition to the cross-sectional enrollment process, these subjects will be followed regularly for longitudinal studies.    A genome-wide association study (GWAS) will be done on an initial group of 3000 subjects to identify genetic variants associated with case-control status and several quantitative phenotypes related to COPD.  The initial findings will be verified in an additional 3000 COPD cases and 3000 smoking control subjects, and further candidate gene studies will be carried out.  

Conclusions: COPDGene will provide important new information about genetic factors in COPD, and will characterize the disease process using high resolution CT scans.  Understanding genetic factors and CT phenotypes that define COPD will potentially permit earlier diagnosis of this disease and may develop treatments to modify progression.

 

 

Introduction:

Chronic Obstructive Pulmonary Disease (COPD) is the fourth leading cause of death in the United States and an important public health issue [8].  An estimated 24 million individuals in the U.S. may be affected by COPD [12]. Both the number of affected individuals and the number of deaths from COPD are expected to increase as the population ages [11].  COPD is a heterogeneous condition, with a variety of disease-related phenotypes [3,16]. Better understanding of the disease mechanisms is needed to develop effective treatments and prevention strategies.  To accomplish this, we need improved understanding of the etiology of COPD, clinical classifications of the disease that are biologically and medically coherent, and knowledge of genetic factors that influence risk of COPD.  
COPD is strongly associated with smoking, but only a minority of smokers will develop COPD, suggesting that there may be unique genetic differences among individuals leading to greater susceptibility to the most adverse effects of cigarette smoke in some individuals [10].   Relatives of COPD patients show an increased prevalence of airflow obstruction, which supports a role for genetic factors predisposing smokers to COPD [1,9,13]. Smokers with first degree relatives affected by COPD have two to three times the risk of developing disease [13,17]. Genetic factors have been associated with response to lung volume reduction surgery [6] as well as specific patterns of emphysema [2] and degree of functional impairment [5].    Estimated heritability for decline in lung function with age using parent-offspring pairs who both smoke are 0.18 for FEV1 and 0.39 for FVC [4].  Because COPD is likely the result of multiple genes, some of which may interact with environment risk factors (primarily smoking), estimates of heritability that do not include the effects of smoking on lung function are likely to underestimate the true genetic component in COPD.  
The Genetic Epidemiology of COPD (COPDGene®) Study was designed to identify genetic factors in COPD, to define and characterize disease-related phenotypes and to assess the association of disease-related phenotypes with the identified susceptibility genes.  This multi-center study is funded by the National Heart, Lung and Blood Institute (NHLBI).  A key feature of the project is to enroll a large cohort (12,000) of subjects, spanning the breadth of disease severity including smokers and non-smokers without COPD as controls.   Two groups are being studied: Non-Hispanic Whites and Non-Hispanic African Americans.

The primary goals of the study are:
1)     Phenotypic characterization of COPD subjects using computed tomography, as well as clinical and physiological measures, to separate the COPD syndrome into significant distinct subtypes that may be more etiologically homogeneous.
2)    Utilize genome-wide association studies to provide insight into the genetic risk factors for COPD and its subtypes.

 

Methods:
Study Design
Specific Aim 1: Cohort Building.  The enrollment of 12,000 subjects is balanced with 2/3 non-Hispanic White and 1/3 African American, distributed across the full spectrum of disease severity and both genders (Table 4).    The cohort is specifically being recruited for a genome wide association study (GWAS) analysis and is large enough to provide adequate statistical power to detect genes exerting modest effects on risk.

Specific Aim 2: Characterization of Subtypes of COPD.   The main characterization of COPD subtypes will be based on the presence and severity of parenchymal and airway disease based on inspiratory and expiratory high-resolution chest CT scans.

Specific Aim3: Genome-Wide Association Study (See Figure 2).   The initial study plan for the GWAS involves four phases.  There will be an initial GWAS on a balanced group of 3000 subjects of current or former smoker case and control subjects (2000 White and 1000 African American) in Phase 1.  Statistical signals (SNPs in or between genes) identified in Phase I will be confirmed in Phase II with a custom SNP array that will provide greater coverage of genes.  In Phase III SNPs in genes/regions identified and confirmed in Phases I and II will be investigated with regional fine mapping and tests of associations to identify causal genes.  The final group of candidate genes will be replicated in other COPD cohorts as Phase IV.  With continued improvements in SNP genotyping technology additional phases (beyond Phase 1) may be analyzed at the genome-wide level.

Specific Aim 4: Natural history of COPD and Risk Factors for Progression.  The COPDGene cohort will be established for longitudinal follow-up with regular contact made to determine mortality, comorbid disease events and disease status based on clinical and/or chest CT evidence of progression.
Population.

Twenty one clinical study centers (See appendix 1) throughout the United States are enrolling participants under this protocol over a four year period.  Each study site has obtained local IRB approval to enroll participants in this project and all subjects provide informed consent to participate in the study.  

Inclusion and Exclusion Criteria:
The primary inclusion criteria are self-identified racial/ethnic category as either non-Hispanic whites or African-American between ages of 45 and 80 years.  All COPD cases and smoking controls reported at least 10 pack years of smoking and could be current or former smokers.  Non smoker controls are also included in the study. The age range is 45 to 80 years.  Subjects over age 80 are excluded.  Pregnant women are excluded because CT scans are part of the study protocol and represent an unacceptable risk to a fetus.  History of other lung disease except asthma  (e.g. pulmonary fibrosis, extensive bronchiectasis, cystic fibrosis), previous surgical excision of at least one lung lobe, active cancer under treatment, suspected lung cancer (large or highly suspicious lung mass), metal in the chest or recent exacerbation of COPD with antibiotics or steroids are exclusion criteria.  Subjects with recent COPD exacerbations can be enrolled one month after their exacerbation.  
Smokers who have an unclassified pattern by GOLD criteria on spirometry, denoted as GOLD U (normal FEV1/FVC but reduced FEV1) are eligible for the study but will be analyzed separately.  Since a key goal of this project is to define COPD phenotypes in the most complete manner possible, this group of participants was retained to allow the full breadth of smoking related lung disease to be studied.
Individuals diagnosed with asthma, in either the COPD or smoking control groups, are included in COPDGene®.   COPD subjects are often diagnosed with asthma, and therefore excluding asthmatics would not provide an accurate distribution of COPD subjects.  Both case and control groups will be monitored throughout the study for numbers of asthmatics (as defined by report of physician diagnosis of asthma or bronchodilator response on spirometry) in each group and data analysis both with and without asthmatics will assess the impact of the asthma phenotype on inferences from our genetic analyses.

Imaging:
CT scans are acquired using multi-detector CT scanners (at least 16 detector channels). Volumetric CT acquisitions are obtained both on full inspiration (200mAs), and at the end of normal expiration (50 mAs). Image reconstruction utilizes sub-millimeter slice thickness, with smooth and edge-enhancing algorithms. Detailed CT protocols are provided in Appendix 2.

Data Collection:
Each study subject has pre- and post-bronchodilator spirometry performed using a standardized protocol and spirometer (ndd EasyOneTM Spirometer, Zurich, Switzerland).  Information collected from each subject includes a modified American Thoracic Society (ATS) Respiratory Epidemiology Questionnaire, demographic information; medications, medical history, and St George's Respiratory Questionnaire (SGRQ) (see full data collection forms on COPDGene web site at www.COPDGene.org).  Height, weight, blood pressure and oxygen saturation are also assessed.  A standardized six minute walk test is performed on each subject.   Inspiratory and expiratory CT scans are done with defined protocols produces images that permit measurement of airway wall thickness and the extent and distribution of emphysema.   A blood sample for DNA is obtained from each subject; DNA, serum and plasma are stored for future biomarker studies at the COPDGene® Biorepository at John Hopkins University.

Recruitment:
Recruitment of adequate numbers of subjects distributed between controls and four COPD GOLD stages is a key factor for success of this study but poses a logistical challenge.  Recruitment by age, gender, race, and disease status for each clinical study center is monitored on a real-time basis by the Data Coordinating Center (DCC).    Regular review by the Administrative Core and Executive Committee allows the study as a whole to monitor recruitment in specific groups as needed.    A Certificate of Confidentiality from the US Department of Health and Human Services was obtained at the onset of the study to provide additional protection for the research participants and their subsequently generated data on genetic markers.  

Data Management:
All study data are ultimately stored in the COPDGene Data Coordinating Core (DCC) at the Division of Biostatistics and Bioinformatics at National Jewish Health.  Data are entered by each site through a web-accessible system.  Verification of eligibility is completed via a website questionnaire after subjects sign the research consent form, and subjects are tracked for completion of all study data.   If a participant is excluded or discontinues during or after the study procedures, the specific exclusion or discontinuation reason is recorded in the database.  

Pulmonary Function Test (PFT) Core
The objective of the PFT Core is to ensure pulmonary function data are of the highest quality by device specification, technician training, and standardized protocols.   All spirometry data are collected using the ndd EasyOne Spirometer (Zurich, Switzerland).  It is an ultrasound-based spirometer utilizing a dual beam Doppler approach to flow measurement and has Windows-based software program to collect, calculate and store final spirometry data.

Imaging Core
The Imaging Core is centered at National Jewish and works on a collaborative basis with the Iowa Comprehensive Lung Imaging Center at the University of Iowa and imaging staff at the Brigham and Women's Hospital. Research assistants log receipt of images, perform quality analysis, coordinate required readings, and assist with quantitative analysis. De-identified images are submitted on DVDs to the Image Core in DICOM format, using a study ID as the only identifier.

Sample Storage Core
The Sample Storage Core at Johns Hopkins University coordinates blood sample shipments and storage using barcode labels and the Freezerworks database to track samples through intake and processing to serum, plasma, and DNA.  It also provides an inventory of the complete sample storage for the project. Each subject has a minimum of 50 micrograms of DNA stored along with additional aliquots of stored buffy coat, plasma and serum.   

Training of Study Centers:
An initial training program was developed to insure constant data quality across the clinical study centers.  There were six major areas identified for training: spirometry, subject data collection and data entry, participant eligibility, safety assessment and functional tests, CT scans and blood/DNA collection and shipping.     Training programs were made available on the study website for each site to train new coordinators.  Spirometry skills were assessed after the formal training by requiring each coordinator to submit test values and flow curves on 3 naïve subjects. Radiology technicians were trained at their local sites by a web based program describing CT scan methods, to provide uniformity in verbal instructions to subjects, performance of the CT protocol, and management of CT data.  
Each Clinical Center was individually assessed for completion of all training activities. After obtaining final IRB approval for the project, the Clinical Center director and coordinator(s) participated in a teleconference with the administrative core to initiate activation of the Clinical Center.  Two pilot subjects were enrolled at each site and subject data collection, data transfer, CT methods and shipping procedures were reviewed and approved prior to beginning full enrollment.

Quality control:
Each of the study Cores (PFT, Imaging, Biorepository, Genome Wide Analysis, Candidate Genotyping, and the DCC) developed plans for quality control of data handling.  The DCC is the central storage site for all study data and leads the QA program for data entry, including range-checking of data on entry, multi-variable validation and monthly reports on data quality, and maintenance of an auditing record of all data changes.  Each study center is informed weekly about out-of-range data so problems can be resolved rapidly.  Data identified as out of range are reviewed by the Quality Control Committee and when necessary by the Adjudication Committee.  

CT scan quality control
Quality assurance of CT images is multi-level.  Each CT scan is visually inspected by the local clinical radiologist for adequate inspiration, absence of motion artifact, and inclusion of all parts of the chest. At the Imaging Core, a trained Professional Research Assistant evaluates the scan for technical completeness, compliance with protocol, adequacy of inspiration, and presence of motion artifact. The quality of the automated segmentations of airways is verified. Finally, the stability of CT measurements for each scanner used in the study is monitored by monthly scanning using a custom COPDGene phantom designed for this study.
Analysis:

Phenotypes:
A large amount of phenotypic information is collected from study participants. To minimize the substantial multiple testing problems in our GWA analysis, we will focus on four key phenotypes for analysis.
1.    Status of COPD - defined as GOLD Stages 2-4 in smokers. The absence of COPD is determined by normal spirometry in smokers.
2.    Airflow obstruction (post-bronchodilator FEV1 - used as a continuous variable in COPD cases)
3.    Emphysema (% of lung <-950 HU - used as a continuous variable in COPD cases)
4.    Air trapping on expiratory CT (% of lung < -856 HU - used as a continuous variable in COPD cases)
5.    Airway disease - wall area percent of the 4th and 5th generation airways.

CT phenotyping:
The following analyses are performed on segmented lung images, using VIDA software (VIDA Diagnostics, http://www.vidadiagnostics.com): total inspiratory and expiratory lung volumes, mean lung attenuation, and relative lung volumes (for the whole lung, and for each lobe) falling below attenuation thresholds of -950, -910 and -856 HU. Emphysema distribution is assessed by comparing percent emphysema in central vs. peripheral lung and upper vs. lower lobes. Automated airway segmentation and quantification are performed, as discussed by Hoffman et al 7.  For each bronchial tree, multiple parameters are calculated for third, fourth, fifth, and sixth generation bronchi, including wall area, lumen area, wall thickness, and luminal diameter.

Genetic analysis plan:
COPDGene® will apply three general analytical strategies for the genome-wide SNP data to both maximize statistical power in identifying disease susceptibility loci (DSL) and minimize false positive results. 1. Immediate Identification of DSL achieving genome-wide significance (using methods for screening and testing in the same dataset 18), 2. Ranking SNPs based on estimated effect size for 2-Stage design, 3. Combining results across racial groups through either meta-analytic techniques or by incorporating covariates that summarize genetic background.
Genetic association tests will be performed for both qualitative and quantitative COPD-related phenotypes. Separate association analysis will be performed in the GOLD Stage 1 subjects and GOLD-U subjects to see if these subsets have significantly different distributions of disease-associated alleles and/or haplotypes compared to those seen in other GOLD Stage subjects.

Data Sharing:
The resources and the results of the COPDGene® study will be made available to other investigators in a manner that will allow the broad scientific community to benefit from the work of this project while protecting the privacy and confidentiality of research subjects.  The data sharing plan is to provide all datasets (including genotype and phenotype data) to dbGAP (http://www.ncbi.nlm.nih.gov/sites/entrez/dbgap) as soon as possible after the data is verified to the standards described in the QC section above.

 

 

Discussion:
We anticipate that COPDGene® will generate a unique, large cohort of well-phenotyped subjects for COPD research. The high level of phenotypic characterization will provide a valuable resource for studies into the genetics, epidemiology, and natural history of COPD.  The genome-wide association (GWA) approach chosen for COPDGene® has the potential to identify genes influencing risk for complex diseases in a systematic and unbiased manner without relying on our currently limited knowledge of pathophysiology to select candidate genes. Moreover, association studies may be able to detect genes of modest effect that cannot be identified using conventional linkage analysis [15]. The rapidly expanding array of successful GWA studies demonstrates that this approach has potential to provide new insights for complex diseases like COPD [14]. 
The relative importance of common vs. rare variants in the etiology of complex diseases remains a subject of some debate. Common genetic variants are likely to contribute to the control of complex diseases, although their individual effects on risk may be quite modest, and furthermore multiple genes are likely to be involved. Rare genetic variants are also likely to contribute to risk, and while their individual effect may be larger, their rarity in the population makes it difficult to identify and confirm their effects in case-control designs. Identification of very rare genetic variants is not practical using genetic association analysis because of the extremely large sample sizes needed; however, the sample sizes proposed in this project will enable us to identify relatively rare alleles (e.g., allele frequency as low as 0.05) associated with moderately increased disease risk.  A major limitation of GWA analysis in a single phase is the unacceptable number of false positive SNPs that will be identified simply due to the extremely large number of statistical tests conducted. The multi-phase study design proposed will specifically to limit false positive findings, while maximizing the number of true positives.  Furthermore, we will compare tests within the COPDGene cohort to results from other cohorts and family based studies to replicate our results.
In addition to the analyses of the entire COPDGene population listed above, separate analyses of the GOLD Stage 1 cases will be performed. We will attempt to identify a normal subgroup and an early disease subgroup within this phenotypic category based on their CT emphysema, CT airway, and spirometric characteristics. The relationship of this "normal" subset to functional impairment and disease impact measures will be assessed.   We hypothesize individuals in the putative normal subgroup will have less functional impairment, less evidence for disease impact, and fewer exacerbations. Ultimately, longitudinal follow-up will be required to determine if the hypothesized "normal" subgroup of GOLD 1 subjects are less likely to progress to full airflow obstruction. However, cross-sectional analysis of GOLD 1 subjects will determine whether clinical heterogeneity can be discerned within these groups using CT data.  
There are several future research opportunities generated by this study that will be important for the general pulmonary research community. First, additional characterization of functional variants in any and all susceptibility genes identified here will be required. This will involve resequencing these genes to identify specific mutations followed by further biochemical or physiologic studies to define the functional impact of these variants using animal models. Second, longitudinal investigation of all cases and controls recruited for COPDGene will provide new insights into the natural history, epidemiology and even the genetic basis of COPD. This would include improved understanding of the GOLD 1 and GOLD U groups, plus assessment of risk factors for COPD progression, morbidity, and mortality.

 

Conclusion:
COPD is a disease with important public health implications given its often profound effects on functional capacity, quality of life and mortality.  At this time there is a dearth of effective disease treatments for moderate to severe disease or effective secondary prevention strategies for early or occult disease.  Further progress in these areas is hampered by the long latency period between smoking exposure and development of clinical disease, as well as by a relatively small proportion of smokers who develop symptomatic disease.  Wide variation in disease expression patterns (airway disease, emphysema, extrapulmonary effects and patterns of exacerbations) may limit statistical power to detect successful results within these subsets in therapeutic trials.  
COPDGene® with its large population and focus on CT phenotypes proposes to define subsets of COPD that may reflect effects of specific genetic variants.  Careful CT phenotyping may generate diagnostic imaging biomarkers and permit early disease identification in high risk groups.  This early diagnosis of asymptomatic disease will provide new opportunities to develop prevention strategies and treatment to limit disease progression.  Available treatments will also spur new efforts to encourage screening for early disease in smokers with continued emphasis on smoking cessation.  The genetic associations expected from performing GWAS in this large cohort may reveal novel directions for defining disease mechanisms while advancing knowledge about basic mechanisms and also providing opportunities for treatment and prevention.
Finally, the wealth of data to be accrued in COPDGene® will be stored and made available to the broader scientific community for future studies.  This will include the detailed phenotypic subject information, whole genome data and the imaging data from CT scans.

 

 

Acknowledgments:  The project described is being supported by Award Numbers U01HL089897 and U01HL089856 from the National Heart, Lung, And Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, And Blood Institute or the National Institutes of Health.

This work is also supported by the Monfort Family Foundation and by the COPD Foundation.  AstraZeneca Pharmaceuticals LP, Novartis Pharmaceuticals Corporation, and Sepracor Inc are ongoing supporters of the project through the COPDGene Industry Advisory Group.

The authors listed have all participated fully in the design of the study protocol, the implementation of the study and the writing and review of the manuscript
Appendix 1.  COPDGene® Investigators
Appendix 2.  CT scanning protocol

Reference List

(1)     Cohen BH, Ball WC Jr, Brashears S et al. Risk factors in chronic obstructive pulmonary disease (COPD). Am J Epidemiol 1977; 105(3):223-32
(2)     DeMeo DL, Hersh CP, Hoffman EA et al. Genetic determinants of emphysema distribution in the national emphysema treatment trial. Am J Respir Crit Care Med 2007; 176(1):42-8
(3)     Friedlander AL, Lynch D, Dyar LA et al. Phenotypes of chronic obstructive pulmonary disease. COPD 2007; 4(4):355-84
(4)     Gottlieb DJ, Wilk JB, Harmon M et al. Heritability of longitudinal change in lung function. The Framingham study. Am J Respir Crit Care Med 2001; 164(9):1655-9
(5)     Hersh CP, Demeo DL, Lazarus R et al. Genetic association analysis of functional impairment in chronic obstructive pulmonary disease. Am J Respir Crit Care Med 2006; 173(9):977-84
(6)     Hersh CP, DeMeo DL, Reilly JJ et al. Xenobiotic metabolizing enzyme gene polymorphisms predict response to lung volume reduction surgery. Respir Res 2007; 8:59
(7)     Hoffman EA, Simon BA, McLennan G. State of the Art. A structural and functional assessment of the lung via multidetector-row computed tomography: phenotyping chronic obstructive pulmonary disease. Proc Am Thorac Soc 2006; 3(6):519-32
(8)     Hoyert DL, Arias E, Smith BL et al. Deaths: final data for 1999. Natl Vital Stat Rep 2001; 49(8):1-113
(9)     Kueppers F, Miller RD, Gordon H et al. Familial prevalence of chronic obstructive pulmonary disease in a matched pair study. Am J Med 1977; 63(3):336-42
(10)     Lokke A, Lange P, Scharling H et al. Developing COPD: a 25 year follow up study of the general population. Thorax 2006; 61(11):935-9
(11)     Mannino DM, Buist AS. Global burden of COPD: risk factors, prevalence, and future trends. Lancet 2007; 370(9589):765-73
(12)     Mannino DM, Homa DM, Akinbami LJ et al. Chronic obstructive pulmonary disease surveillance--United States, 1971-2000. MMWR Surveill Summ 2002; 51(6):1-16
(13)     McCloskey SC, Patel BD, Hinchliffe SJ et al. Siblings of patients with severe chronic obstructive pulmonary disease have a significant risk of airflow obstruction. Am J Respir Crit Care Med 2001; 164(8 Pt 1):1419-24
(14)     Pillai SG, Ge D, Zhu G et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet 2009; 5(3):e1000421
(15)     Risch NJ, Zhang H. Mapping quantitative trait loci with extreme discordant sib pairs: sampling considerations. Am J Hum Genet 1996; 58(4):836-43
(16)     Silverman EK. Exacerbations in chronic obstructive pulmonary disease: do they contribute to disease progression? Proc Am Thorac Soc 2007; 4(8):586-90
(17)     Silverman EK, Chapman HA, Drazen JM et al. Genetic epidemiology of severe, early-onset chronic obstructive pulmonary disease. Risk to relatives for airflow obstruction and chronic bronchitis. Am J Respir Crit Care Med 1998; 157(6 Pt 1):1770-8
(18)     Van Steen K, McQueen MB, Herbert A et al. Genomic screening and replication using the same data set in family-based association testing. Nat Genet 2005; 37(7):683-91