A Replicable Method for Gathering, Analysing and Visualising Podcast Episode and Review Data - Part 4

Rate and Review - Part 4

This post is the final in a series of four that present a workflow for gathering, analysing and visualising data about podcasts. The work presented here builds on data gathered in Part 1, analysed in Part 2 and visualised in Part 3. It is recommended that you complete the work in those previous posts before attempting the work in this one.

This post takes a dataset created in parts 1-3 and produces this interactive document that will enable researchers to explore the contents of the dataset and the results of analysis.

For those interested in researching podcasts, following the entire workflow through to the creation of a similar interactive document will allow the researcher to filter the dataset according to a number of parameters, creating new visualisations and database views each time, alongside providing a means to read in detail the contents of reviews gathered.

1 Getting started and general background

The code discussed in this tutorial creates a Shiny Markdown document. If you are unfamiliar with this format, I would recommend you read this tutorial before starting.

In addition, I would recommend also that you view this excellent tutorial video on YouTube by Sharon Machlis. The video is a great walkthrough of the rudiments of a Shiny Document and it really helped me get to grips with the process of coding the different parts. Even if you already have some experience with codiing in R, or even of making Shiny Applications, there are a couple of very subtle differences in coding Shiny Documents, and Sharon’s video really helped me to understand some of the basic mistakes I made when I was developing my own document.

All of which is to say, I won’t be going too much into the detail for how and why Shiny documents work in the way that they do. The links above will explain them to you in a much more consise and reliable way. Rather, I will simply walkthrough the elements of the document I have created and try to explain some of the decisions I made.

On a more general note, the broader Shiny universe is something I am still getting to grips with, and one of the exciting things about it is how quickly you can learn new and improved ways of doing things. To that end, I would recommend following Jesse Mostipak on Twitter. Jesse is a Developer Advocate at RStudio and runs a series of free webinars that I have found really helpful. So much so, in fact, that one of my motivations for posting this series of posts was - strange as it may seem - to publicly record where I’m at right now in my own development, because the webinars organised by Jesse have already made me realise that I could be doing things better (more efficiently, more elegantly, etc.). As such, I hope to revisit the code in this and other posts in future to see if I can improve upon it.

In the meantime, the document I have created using the code below is available to view at this link{target=“_blank”. As outlined in Part 1, the overall purpose of this work was to support a journal article I am writing with a colleague. The article narrates the processes of collecting and analysing data about podcasts, but it was this final part that really helped us in terms of exploring the data and results when we can to write up our findings.

2 YAML header

In your R enviroment, create a new markdown document and copy the code below.

The first part of the code at the YAML header, which contains title, author and date information, along with some commands that tell R and Markdown that this is a Shiny document.

---
title: "THE TITLE OF YOUR DOCUMENT"
author: "THE NAME OF THE AUTHOR"
date: "14/08/2021"
output:
  html_document:
    df_print: paged
    toc: true
runtime: shiny
---

3 Load packages and data

In the next part of your code you will set up the enviroment your document will operate within. This code will not be visible in the eventual rendered document. Here, the packages required will be loaded alongside the data generated in Part 3 of this tutorial.

Not that ## signs in the first and last line of code - these are here because this blog post is also a markdown document and I am using similar commands in the background. These lines are required whenever you want to display or run code in markdown document, so if you are copying this code remember to remove them. See the aforementioned tutorial from Sharon Machlis on how best to render these initial commands.

##```{r setup, include=FALSE}## - remove the hash characters from this line before copying the code.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(DT)
library(lubridate)
library(shinyWidgets)
library(ggplot2)
library(RColorBrewer)
reviews <- readRDS('reviews.rds')
reviews$date_only <- date(reviews$date)
##```## - remove the hash characters from this line before copying the code.

4 Create Filters

Having loaded the packages and data in the background, you can then create a number of user inputs that will subset the database and render visualisations.

The filters below variously also the user to subset the reviews data base by adding or removing podcasts, ratings, review topics, main topics (see Part 3), the word count of reviews, the country of review, and the date of review.

By using these parameters in combination, a user could drill down to reviews for a given podcast, within a given topic, within a given date range. And so on.

##```{r user_select, echo=FALSE} ## remove hashtags
fluidRow(
  column(3, pickerInput("podcast", "Choose Podcast:", choices = sort(unique(reviews$podcast)), options = list(`actions-box` = TRUE), multiple = TRUE, selected = sort(unique(reviews$podcast)))),
  column(3, pickerInput("rating", "Choose Rating:", choices = sort(unique(reviews$Rating)), options = list(`actions-box` = TRUE), multiple = TRUE, selected = sort(unique(reviews$Rating)))),
    column(3, selectInput("review_topic", "Choose Review Topic:", choices = sort(unique(reviews$topicsLDA7)), multiple = TRUE, selected = sort(unique(reviews$topicsLDA7)))),
  column(3, selectInput("topic", "Choose Main Topic:", choices = sort(unique(reviews$main_topic_name)), multiple = TRUE, selected = sort(unique(reviews$main_topic_name))))
)
fluidRow(
  column(4,
         sliderInput("words", "Word count:", min = min(reviews$wordsinreview), max = max(reviews$wordsinreview), value = c(min(reviews$wordsinreview), max(reviews$wordsinreview)))),
    column(4, pickerInput("country", "Choose Country:", choices = sort(unique(reviews$country)), options = list(`actions-box` = TRUE), multiple = TRUE, selected = c("gb", "us"))),
  column(4,
         sliderInput("date", "Select date range:", min = min(reviews$date_only), max = max(reviews$date_only), value = c(min(reviews$date_only), max(reviews$date_only)), timeFormat="%Y-%m-%d"))
)
##```## remove hashtags

5 Create Reactive dataframe

The following code takes the user inputs from the filters added above and creates a new version of the reviews database loaded earler each time the parameters change. If not changes are made to a given filter, the pre-selected values above will be selected.

##```{r reviewdata, echo=FALSE}##
reviews_data <- reactive({
  reviews %>%
    filter(date_only >= min(input$date)) %>%
    filter(date_only <= max(input$date)) %>%
    filter(wordsinreview >= min(input$words)) %>%
    filter(wordsinreview <= max(input$words)) %>%
    filter(Rating %in% input$rating) %>% 
    filter(main_topic_name %in% input$topic) %>%
    filter(country %in% input$country) %>%
    filter(podcast %in% input$podcast) %>%
    filter(topicsLDA7 %in% input$review_topic)
})
##```##

6 Create Dataframe view

This code takes the reactive dataframe created above and displays it as a Data Table. Because the rendered datatable is based on the reactive dataframe, which in turn is recreated each time a user alters the parameters of one or more filters, this datatable will change each time a user makes a selection.

##```{r reviewtable, echo=FALSE}##
renderDT({
  reviews_data_new <- reviews_data()
  reviews_data_table <- reviews_data_new[, c(2, 9, 10, 12, 17, 47)]
  DT::datatable(reviews_data_table,
                options = list(pageLength = 5), 
                rownames = FALSE,
                colnames = c("Podcast", "Rating", "Review", "Country", "Review Topic","Podcast Main Topic"), escape = F)  
})
##```##

7 Create Visulisation(s)

Finally, this code produces a reactive visualisation in exactly the same way as the DataTable above. Each of the visualisations created in Part 3 are included in the live version of this document. The code below can be adapted simply by entering the ggplot code from the visualisations concerned.

##```{r reviewcount, echo=FALSE}##
renderPlot({
reviews_data_new <- reviews_data()
reviews_data_new %>%
  ggplot() +
  aes(x=reorder(short_name, short_name, function(x) length(x))) +
  geom_bar(aes(fill = main_topic_name)) +
  coord_flip() +
  scale_fill_brewer(palette = "Set1", name = 'Main Topic') +
  xlab("Podcast") +
  ylab("Number of Reviews")
})
##```##  

8 Running and Publishing your interactive document.

To run your document locally, simply click ‘Run Document’ in your R Studio interface. To publish it online, click the ‘Publish’ button. For more information on the deploying Shiny documents, see this tutorial.

I hope you’ve found this tutorial useful. Please feel free to drop me a line if you have any questions about this process, or if you would like to discuss ways we can work together.

Dr Craig Hamilton
Dr Craig Hamilton

My research interests include popular music, digital humanities and online cultures.