RMarkdown, Data and Templates

Creating RMarkdown documents from data by using and styling templates

RMarkdown, Data and Templates

Creating RMarkdown documents from data by using and styling templates

I’m currently working on an internal project at BCU around narrating the progress of a long-term University initiative. In a nutshell, the task is to collate a large number of documents from different projects and produce something that will enable readers to easily navigate to reports, videos, links and other material related to their particular area of interest in the initiative.

My plan is to achieve this through the creation of an R Markdown file which can be rendered as an HTML page, and my intention is for a database to sit behind that document to power what people see. The advantage of approaching the project in this way is that additions and removals of materials related to the initiative need only be added to the database; the RMarkdown file can remain untouched.

In theory this approach seems sound, but I first needed to examine whether I could indeed produce such an outcome. This post will detail the basics of that exploratory process, and hopefully it will be useful to you if you are considering a similar proejct.

In the next section, I produce a basic working example of how a dataset and a Markdown template can be used together to produce an HTML document populated by the contents of a database. Further down, I then add some additional elements to both the database and template in order to create some basic styling, and I then demonstrate how filtering on the dataset could enable the creation of different thematic sections of the resulting document.

The basis for the work presented here is this post on Stackoverflow, which provided me with the necessary code to produce a basic, working version. I have then added additional elements to the code in later sections in order to explore document styling and thematic automation.

I hope you find it useful!

Dynamic Document Creation

In the example below, the code first generates a dataset called input comprising four rows of random information populated across three variables:

  • name: A categorical variable (A:D)
  • data: A numerical variable of values between 0 and 1
  • text: A text string

Next, a template is created for sections within the document.

Finally, a for loop iterates across each row within input and populates the template in three places:

  • Section Title is populated with the contents of name.
  • Line 1 add the contents of data.
  • Line 2 add the contents of text.

The principle being demonstrated here is that a more complex document can be organised along similar lines, providing that a dataset and template is created in the correct way.

The code below is marked with the text stages described above. The result is displayed beneath.

### 1: GENERATE THE DATASET 'INPUT' AND POPULATE WITH RANDOM DATA
input <- data.frame(
  name = LETTERS[1:4],
  data = runif(n = 4),
  text = replicate(4, paste(sample(x = LETTERS, size = 10, replace = TRUE), collapse = "")),
  stringsAsFactors = FALSE)

### 2: CREATE A TEMPLATE
template <- "#### This is section %s
Section data is `%0.2f`.

Additional section text is: %s.

"
### 3: A FOR LOOP ITERATES THROUGH EACH LINE OF THE INPUT DATASET AND PRODUCES 4 SECTIONS
for (i in seq(nrow(input))) {
  current <- input[i, ]
  cat(sprintf(template, current$name, current$data, current$text))
}

This is section A

Section data is 0.27.

Additional section text is: DUTIUMANMZ.

This is section B

Section data is 0.18.

Additional section text is: IVPDBPAGYJ.

This is section C

Section data is 0.92.

Additional section text is: FBCXUMXUNQ.

This is section D

Section data is 0.72.

Additional section text is: RBPGNOAMTF.

Adding Stylistic Elements

You can see from above that four document sections were created automatically from just a few lines of code. However, these were very basic. In the example below, the original code is augmented in three ways:

  • some images are added to the dataset and template.
  • the data variable is displayed within a coloured box by adding css to the template
  • the text variable is used as a hyperlink out to the BCU website.

First, new elements are added to the input dataframe so that the template can call upon:

  • image - the names of images in the document directory (these are basic images I created for this example, displaying A-D)
  • image_caption - a unique caption for each image
  • hyperlink - for the purpose of this example, all rows in input will link to the BCU site, but these can be individual project websites, or other types of links.
### 1: ADD ADDITIONAL ELEMENTS TO THE INPUT DATABASE 
input <- data.frame(
  name = LETTERS[1:4],
  data = runif(n = 4),
  text = replicate(4, paste(sample(x = LETTERS, size = 10, replace = TRUE), collapse = "")),
  stringsAsFactors = FALSE)

input$image <- c("imageA.png", "imageB.png", "imageC.png", "imageD.png")
input$image_caption <- c("Image A Caption", "Image B Caption", "Image C Caption", "Image D Caption")
input$hyperlink <- "https://www.bcu.ac.uk"

Second, the stylistic elements we want to incorporate can be added the template:

### 2: AMEND THE TEMPLATE
template <- "#### This is section %s with automated styling applied

<div class = 'blue'>
Section <b>data</b> is now in a blue box `%0.2f`.
</div>
Additional section <b>text</b> is now a hyperlink to the BCU website: [%s.](%s){target='_blank'}

And we now have <b>images</b> with their <b>image_captions</b> underneath.
![%s](/img/%s)

"

Finally, the new elements of input are added to the for loop and called to populate the template.

### 3: A FOR LOOP ITERATES THROUGH EACH LINE OF THE INPUT DATASET AND PRODUCES 4 SECTIONS
for (i in seq(nrow(input))) {
  current <- input[i, ]
  cat(sprintf(template, current$name, current$data, current$text, current$hyperlink, current$image_caption, current$image))
}

This is section A with automated styling applied

Section data is now in a blue box 0.27.

Additional section text is now a hyperlink to the BCU website: LNOLIYWLSF.

And we now have images with their image_captions underneath. Image A Caption

This is section B with automated styling applied

Section data is now in a blue box 0.17.

Additional section text is now a hyperlink to the BCU website: OLMGORCNRP.

And we now have images with their image_captions underneath. Image B Caption

This is section C with automated styling applied

Section data is now in a blue box 0.63.

Additional section text is now a hyperlink to the BCU website: CBCIHAQOSF.

And we now have images with their image_captions underneath. Image C Caption

This is section D with automated styling applied

Section data is now in a blue box 0.45.

Additional section text is now a hyperlink to the BCU website: QGLRCSVPWK.

And we now have images with their image_captions underneath. Image D Caption

Filtering the dataset to produce thematic sections.

At the moment, the template calls all rows of the input dataset, but for the project I am working on we will want to display documents and other information according to particular themes, strands, or timelines. This can be achieved by filtering the input data set according to a given set of conditions. In the final example below, we can add two further variables to the input dataset:

  • theme - entries are either in the Apple or Orange theme
  • year - entries are either from 2020 or 2021

I have entered data into these rows so that only the second line of the input dataset fulfills the following conditions: theme = Apple & year = 2021.

### 1: CREATE INPUT DATAFRAME 
input <- data.frame(
  name = LETTERS[1:4],
  data = runif(n = 4),
  text = replicate(4, paste(sample(x = LETTERS, size = 10, replace = TRUE), collapse = "")),
  stringsAsFactors = FALSE)

input$image <- c("imageA.png", "imageB.png", "imageC.png", "imageD.png")
input$image_caption <- c("Image A Caption", "Image B Caption", "Image C Caption", "Image D Caption")
input$hyperlink <- "https://steamhouse.org.uk"
input$theme <- c("Apple", "Apple", "Orange", "Organge")
input$year <- c("2020","2021", "2020", "2020")

# 2: FILTER INPUT BASED ON CONDITIONS
input_new <- input %>%
  filter(theme == "Apple" & year == "2021")

# 3: CREATE TEMPLATE
template <- "#### This is section %s with automated styling applied

Because input was filtered according to the contents of <b>theme</b> and <b>year</b> the only output is the second line of data. Everything else remains constant. 

<div class = 'blue'>
Section <b>data</b> is now in a blue box `%0.2f`.
</div>

Additional section <b>text</b> is now a hyperlink to the SteamHouse website: [%s.](%s){target='_blank'}

And we now have <b>images</b> with their <b>image_captions</b> underneath.
![%s](/img/%s)
"

# 4: RUN LOOP
for (i in seq(nrow(input_new))) {
  current <- input_new[i, ]
  cat(sprintf(template, current$name, current$data, current$text, current$hyperlink, current$image_caption, current$image))
}

This is section B with automated styling applied

Because input was filtered according to the contents of theme and year the only output is the second line of data. Everything else remains constant.

Section data is now in a blue box 1.00.

Additional section text is now a hyperlink to the SteamHouse website: RMXYTYAWOI.

And we now have images with their image_captions underneath. Image B Caption

Next Steps

The workflow above provides a basic overview of how to use a dataset to create a R Markdown document from just a few lines of code. In order to progress my project to the next phase, consideration now needs to be given to how the dataset that sits behind the document is to be arranged. For instance, information related to themes, timelines etc., will need to be carefully inputted.

As a basic start, however, the work above demonstrates that it - at least - technically possible to work through the project in the manner I’d hoped.

If you have any questions about this process, feel free to drop me a line.

Avatar
Craig Hamilton
Research Fellow

My research interests include popular music, digital humanities and online cultures

Related