Longitudinal studies
Statistic Corner

Longitudinal studies

Edward Joseph Caruana1, Marius Roman1, Jules Hernández-Sánchez2, Piergiorgio Solli1

1Department of Thoracic Surgery, Papworth Hospital, Cambridge, UK; 2Research and Development, CTBI, Papworth Hospital, Cambridge, UK

Correspondence to: Piergiorgio Solli. Papworth Hospital NHS Foundation Trust, Papworth Everard Cambridgeshire, CB23 3RE, UK. Email: piergiorgio.solli@nhs.net.

Submitted Sep 19, 2015. Accepted for publication Oct 09, 2015.

doi: 10.3978/j.issn.2072-1439.2015.10.63


Longitudinal studies employ continuous or repeated measures to follow particular individuals over prolonged periods of time—often years or decades. They are generally observational in nature, with quantitative and/or qualitative data being collected on any combination of exposures and outcomes, without any external influenced being applied. This study type is particularly useful for evaluating the relationship between risk factors and the development of disease, and the outcomes of treatments over different lengths of time. Similarly, because data is collected for given individuals within a predefined group, appropriate statistical testing may be employed to analyse change over time for the group as a whole, or for particular individuals (1).

In contrast, cross-sectional analysis is another study type that may analyse multiple variables at a given instance, but provides no information with regards to the influence of time on the variables measured—being static by its very nature. It is thus generally less valid for examining cause-and-effect relationships. Nonetheless, cross-sectional studies require less time to be set up, and may be considered for preliminary evaluations of association prior to embarking on cumbersome longitudinal-type studies.

Longitudinal study designs

Longitudinal research may take numerous different forms. They are generally observational, however, may also be experimental. Some of these are briefly discussed below:

  • Repeated cross-sectional studies where study participants are largely or entirely different on each sampling occasion;
  • Prospective studies where the same participants are followed over a period of time. These may include:
    • Cohort panels wherein some or all individuals in a defined population with similar exposures or outcomes are considered over time;
    • Representative panels where data is regularly collected for a random sample of a population;
    • Linked panels wherein data collected for other purposes is tapped and linked to form individual-specific datasets.
  • Retrospective studies are designed after at least some participants have already experienced events that are of relevance; with data for potential exposures in the identified cohort being collected and examined retrospectively.

Advantages and disadvantages


Longitudinal cohort studies, particularly when conducted prospectively in their pure form, offer numerous benefits. These include:

  • The ability to identify and relate events to particular exposures, and to further define these exposures with regards to presence, timing and chronicity;
  • Establishing sequence of events;
  • Following change over time in particular individuals within the cohort;
  • Excluding recall bias in participants, by collecting data prospectively and prior to knowledge of a possible subsequent event occurring, and;
  • Ability to correct for the “cohort effect”—that is allowing for analysis of the individual time components of cohort (range of birth dates), period (current time), and age (at point of measurement)—and to account for the impact of each individually.


Numerous challenges are implicit in the study design; particularly by virtue of this occurring over protracted time periods. We briefly consider the below:

  • Incomplete and interrupted follow-up of individuals, and attrition with loss to follow-up over time; with notable threats to the representative nature of the dynamic sample if potentially resulting from a particular exposure or occurrence that is of relevance;
  • Difficulty in separation of the reciprocal impact of exposure and outcome, in view of the potentiation of one by the other; and particularly wherein the induction period between exposure and occurrence is prolonged;
  • The potential for inaccuracy in conclusion if adopting statistical techniques that fail to account for the intra-individual correlation of measures, and;
  • Generally-increased temporal and financial demands associated with this approach.

Embarking on a longitudinal study

Conducting longitudinal research is demanding in that it requires an appropriate infrastructure that is sufficiently robust to withstand the test of time, for the actual duration of the study. It is essential that the methods of data collection and recording are identical across the various study sites, as well as being standardised and consistent over time. Data must be classified according to the interval of measure, with all information pertaining to particular individuals also being linked by means of unique coding systems. Recording is facilitated, and accuracy increased, by adopting recognised classification systems for individual inputs (2).

Numerous variables are to be considered, and adequately controlled, when embarking on such a project. These include factors related the population being studied, and their environment; wherein stability in terms of geographical mobility and distribution, coupled with an ability to continue follow-up remotely in case of displacement, are key. It is furthermore essential to appropriately weigh the various measures, and classify these accordingly so as to facilitate the allocation effort at the data collection stage, and also guide the use of possibly limited funds (3). Additionally, the engagement and commitment of organisations contributing to the project is essential; and should be maintained and facilitated by means of regular training, communication and inclusion as possible.

The frequency and degree of sampling should vary according to the specific primary endpoints; and whether these are based primarily on absolute outcome or variation over time. Ethical and consent considerations are also specific to this type of research. All effort should be made to ensure maximal retention of participants; with exit interviews offering useful insight as to the reason for uncontrolled departures (3).

The Critical Appraisal Skills Programme (CASP) (4) offers a series of tools and checklists that are designed to facilitate the evaluation of scientific quality of given literature. This may be extrapolated to critically assess a proposed study design. Additional depth of quality assessment is available through the use of various tools developed alongside the Consolidated Standards of Reporting Trials (CONSORT) guidelines, including a structured 33-point checklist proposed by Tooth et al. in 2004 (5).

Following adequate design, the launch and implementation of longitudinal research projects may itself require a significant amount of time; particularly if being conducted at multiple remote sites. Time invested in this initial period will improve the accuracy of data eventually received, and contribute to the validity of the results. Regular monitoring of outcome measures, and focused review of any areas of concern is essential (3). These studies are dynamic, and necessitate regular updating of procedures and retraining of contributors, as dictated by events.

Statistical analyses

The statistical testing of longitudinal data necessitates the consideration of numerous factors. Central amongst these are (I) the linked nature of the data for an individual, despite separation in time; (II) the co-existence of fixed and dynamic variables; (III) potential for differences in time intervals between data instances, and (IV) the likely presence of missing data (6).

Commonly applied approaches (7) are discussed below: (I) univariate (ANOVA) and multivariate (MANOVA) analysis of variance is often adopted for longitudinal analysis. Note, in both cases, the assumption of equal interval lengths and normal distribution in all groups; and that only means are compared, sacrificing individual-specific data. (II) mixed-effect regression model (MRM) focuses specifically on individual change over time, whilst accounting for variation in the timing of repeated measures, and for missing or unequal data instances, and (III) generalised estimating equation (GEE) models that rely on the independence of individuals within the population to focus primarily on regression data (6).

With ever-growing computational abilities, the repertoire of statistical tests is ever expanding. In depth understanding and appropriate selection is increasingly more important to ensure meaningful results.

Common errors

Inaccuracies in the analysis of longitudinal research are rampant, and most commonly arise when repeated hypothesis testing is applied to the data, as it would for cross-sectional studies. This leads to an underutilisation of available data, an underestimation of variability, and an increased likelihood of type II statistical error (false negative) (8).

Example: the Framingham heart study

The mid-20th century saw a steady increase in cardiovascular-associated morbidity and mortality after efforts in improving sanitation along with the introduction of penicillin in the 1940s resulted in a significant decline in communicable disease. A drive to identify the risk factors for cardiovascular disease gave birth to the Framingham Heart study in 1948 (9).

Numerous predisposing factors were postulated to align together to produce cardiovascular disease, with increasing age being considered a central determinant. These formed the basis for the hypothesis that underpinned this longitudinal study.

The Framingham study is widely recognised as the quintessential longitudinal study in the history of medical research. An original cohort of 5,209 subjects from Framingham, Massachusetts between the ages of 30 and 62 years of age was recruited and followed up for 20 years. A number of hypothesis were generated and described by Dawber et al. (10) in 1980 listing various presupposed risk factors such as increasing age, increased weight, tobacco smoking, elevated blood pressure, elevated blood cholesterol and decreased physical activity. It is largely quoted as a successful longitudinal study owing to the fact that a large proportion of the exposures chosen for analysis were indeed found to correlate closely with the development of cardiovascular disease.

A number of biases exist within the Framingham Heart Study. Firstly it was a study carried out in a single population in a single town, bringing into question the generalisability and applicability of this data to different groups. However, Framingham was sufficiently diverse both in ethnicity and socio-economic status to mitigate this bias to a degree. Despite the initial intent of random selection, they needed the addition of over 800 volunteers to reach the pre-defined target of 5,000 subjects thus reducing the randomisation. They also found that their cohort of patients was uncharacteristically healthy.

The Framingham Heart study has given us invaluable data pertaining to the incidence of cardiovascular disease and further confirming a number of risk factors. The success of this study was further potentiated by the absence of treatments or modifiers, such as statin therapy and anti-hypertensives. This has enabled this study to more clearly delineate the natural history of this complex disease process.


Longitudinal methods may provide a more comprehensive approach to research, that allows an understanding of the degree and direction of change over time. One should carefully consider the cost and time implications of embarking on such a project, whilst ensuring complete and proven clarity in design and process, particularly in view of the protracted nature of such an endeavour; and noting the peculiarities for consideration at the interpretation stage.




Conflicts of Interest: The authors have no conflicts of interest to declare.


  1. Van Belle G, Fisher L, Heagerty PJ, et al. Biostatistics: A Methodology for the Health Sciences. Longitudinal Data Analysis. New York, NY: John Wiley and Sons, 2004.
  2. van Weel C. Longitudinal research and data collection in primary care. Ann Fam Med 2005;3 Suppl 1:S46-51. [PubMed]
  3. Newman AB. An overview of the design, implementation, and analyses of longitudinal studies on aging. J Am Geriatr Soc 2010;58 Suppl 2:S287-91. [PubMed]
  4. 12 questions to help you make sense of cohort study. Critical Appraisal Skills Programme (CASP) Cohort Study Checklist. Available online: http://media.wix.com/ugd/dded87_e37a4ab637fe46a0869f9f977dacf134.pdf
  5. Tooth L, Ware R, Bain C, et al. Quality of reporting of observational longitudinal research. Am J Epidemiol 2005;161:280-8. [PubMed]
  6. Edwards LJ. Modern statistical techniques for the analysis of longitudinal data in biomedical research. Pediatr Pulmonol 2000;30:330-44. [PubMed]
  7. Nakai M, Ke W. Statistical Models for Longitudinal Data Analysis. Applied Mathematical Sciences 2009;3:1979-89.
  8. Liu C, Cripe TP, Kim MO. Statistical issues in longitudinal data analysis for treatment efficacy studies in the biomedical sciences. Mol Ther 2010;18:1724-30. [PubMed]
  9. Dawber TR, Kannel WB, Lyell LP. An approach to longitudinal studies in a community: the Framingham Study. Ann N Y Acad Sci 1963;107:539-56. [PubMed]
  10. Dawber TR. The Framingham Study: The Epidemiology of Atherosclerotic Disease. Cambridge, Mass: Harvard University Press, 1980.
Cite this article as: Caruana EJ, Roman M, Hernández-Sánchez J, Solli P. Longitudinal studies. J Thorac Dis 2015;7(11):E537-E540. doi: 10.3978/j.issn.2072-1439.2015.10.63

Download Citation