R (r-project.org) is a programming language and software platform for statistical computing and graphics, widely used in academia and industry (see Introduction to R). RStudio is an integrated development environment for R. RStudio makes R easier to use, and it also enables the creation and rendering of plain-text documents that contain embedded R code. With RStudio, you can encapsulate the code and data for your analysis within the text of your paper, fostering research transparency and replicability of results. An increasing number of scholarly journals are requiring that authors submit such replication materials as a condition of publication (see, for example, The AJPS Replication Policy: Innovations and Revisions), and are providing guidelines for data archiving in support of reproducible research (e.g., Reproducible research and Biostatistics and The Role of Data Repositories in Reproducible Research).
RStudio can also be used to insert literature citations into your text and produce formatted bibliographies, using R Markdown, an R-flavored variant of the Markdown language, and the BibTeX bibliographic system. RStudio has also recently developed R Notebooks, which are R Markdown documents that provide a rich workflow for interactive data analysis. R Markdown documents and R Notebooks both can be rendered into publication-quality output in a variety of formats, including HTML, PDF, and Microsoft Word. All of these tools are free and will run on any computer platform.
Reproducible Research
In an 18-minute video, J.J. Allaire, Founder and CEO of RStudio, states:
Those who receive the results of modern data analysis have limited opportunity to verify the results by direct observation. Users of the analysis have no option but to trust the analysis, and by extension the software that produced it. This places an obligation on all creators of software to program in such a way that the computations can be understood and trusted.
This leads to the concept of reproducible research, “the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available.”
The author of What is reproducible research? lists the following criteria:
A study can be truly reproducible when it satisfies at least the following three criteria.
– All methods are fully reported.
– All data and files used for the analysis are (publicly) available.
– The process of analyzing raw data is well reported and preserved.
An excellent reference is Reproducible Research with R and RStudio, Second Edition by Christopher Gandrud. The author has freely provided this book in reproducible form. Pre-compiled PDF versions can also be found in various internet locations, such as here.
This post will demonstrate the use of RStudio as a platform for the production of transparent, reproducible research. RStudio facilitates a form of the plain-text workflow in which you can write, cite the literature and produce formatted bibliographies, perform statistical analyses, create graphics, and execute code in R and several other programming languages, all from one, plain-text document. Because the document contains only plain text, it is futureproof, easily archived and shared, can be edited on any type of computing device, and is fully compatible with version control systems.