For this entry, will be visualizing suicide data from 1958 to 2015 with interactive plots to communicate insights to non-technical audience.
(14 min read)
The data set used in this post is the Suicide Rates Overview 1985 to 2016 data from Kaggle - an online community of data scientists and machine learning practitioners funded by Google. The data set was compiled from numerous sources of the Human development index (HDI) of United Nations Development Program (2018), World development indicators from World Banks (2018), the Suicide in the Twenty-First Century data set, and data from the suicide prevention program by WHO.
In this time and age, a lot of things we do is driven by data and conclusions we drawn from it. The question is, how? It is not like we are looking at tables of data every single time we want to decide on something. Data is oftentimes unstructured, unlabeled, and most of all, overwhelming when we don’t know what to do with it.
With the right tool and the right framework, we will know what to look for and how to look for it. After that, it is a matter of whether we can effectively communicate what we learned from the data to our audience. Data visualization is an effective way to tell stories about data to non-technical stakeholders (e.g., educators, mental health practitioners, or policy makers).
The plot can be even more engaging with the audience when we make it interactive, so that it can emphasize the part that really matters (while fading out the irrelevant part). That is what I aim to do with this entry, building interactive plots for the suicide data!
# Load necessary packages.
library(flexdashboard) # Dashboard package
library(highcharter) # Interactive data visualizations
library(plotly) # Interactive data visualizations
library(viridis) # Color gradients
library(tidyverse) # Metapackge for data management
library(countrycode) # Converts country names/codes
library(rjson) # JSON reader
library(crosstalk) # Provides interactivity for HTML widgets
library(DT) # Displaying data tables
plotly
, crosstalk
, flexdashboard
, and highcharter
packages to make the plot. We will also use glimpse
to check the data set of its content.# Read data.
data <- read.csv('master.csv') %>%
filter(year != 2016, # filter out 2016 and countries with 0 data.
country != 'Dominica',
country != 'Saint Kitts and Nevis')
# Fix the names of some of the countries in our data to match the country names
# used by our map later on so that they'll be interpreted and displayed.
data <- data %>%
mutate(country = fct_recode(country, "The Bahamas" = "Bahamas"),
country = fct_recode(country, "Cape Verde" = "Cabo Verde"),
country = fct_recode(country, "South Korea" = "Republic of Korea"),
country = fct_recode(country, "Russia" = "Russian Federation"),
country = fct_recode(country, "Republic of Serbia" = "Serbia"),
country = fct_recode(country, "United States of America" = "United States"))
# Reorder levels of age to be in chronological order.
data$age <- factor(data$age, levels = c("5-14 years", "15-24 years", "25-34 years", "35-54 years", "55-74 years", "75+ years"))
glimpse(data)
Rows: 27,612
Columns: 12
$ country <fct> Albania, Albania, Albania, Albania, Alban~
$ year <int> 1987, 1987, 1987, 1987, 1987, 1987, 1987,~
$ sex <chr> "male", "male", "female", "male", "male",~
$ age <fct> 15-24 years, 35-54 years, 15-24 years, 75~
$ suicides_no <int> 21, 16, 14, 1, 9, 1, 6, 4, 1, 0, 0, 0, 2,~
$ population <int> 312900, 308000, 289700, 21800, 274300, 35~
$ suicides.100k.pop <dbl> 6.71, 5.19, 4.83, 4.59, 3.28, 2.81, 2.15,~
$ country.year <chr> "Albania1987", "Albania1987", "Albania198~
$ HDI.for.year <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
$ gdp_for_year.... <chr> "2,156,624,900", "2,156,624,900", "2,156,~
$ gdp_per_capita.... <int> 796, 796, 796, 796, 796, 796, 796, 796, 7~
$ generation <chr> "Generation X", "Silent", "Generation X",~
# Create a custom theme for the plots.
custom_theme <- hc_theme(
colors = c('#5CACEE', 'green', 'red'),
chart = list(
backgroundColor = '#FAFAFA',
plotBorderColor = "black"),
xAxis = list(
gridLineColor = "E5E5E5",
labels = list(style = list(color = "#333333")),
lineColor = "#E5E5E5",
minorGridLineColor = "#E5E5E5",
tickColor = "#E5E5E5",
title = list(style = list(color = "#333333"))),
yAxis = list(
gridLineColor = "#E5E5E5",
labels = list(style = list(color = "#333333")),
lineColor = "#E5E5E5",
minorGridLineColor = "#E5E5E5",
tickColor = "#E5E5E5",
tickWidth = 1,
title = list(style = list(color = "#333333"))),
title = list(style = list(color = '#333333')),
subtitle = list(style = list(color = '#666666')),
legend = list(
itemStyle = list(color = "#333333"),
itemHoverStyle = list(color = "#FFF"),
itemHiddenStyle = list(color = "#606063")),
credits = list(style = list(color = "#666")),
itemHoverStyle = list(color = 'gray'))
# Create tibble for our line plot.
overall_tibble <- data %>%
select(year, suicides_no, population) %>%
group_by(year) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create a line plot.
highchart() %>%
hc_add_series(overall_tibble, hcaes(x = year, y = suicide_capita, color = suicide_capita), type = "line") %>%
hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br> Suicides: <b>{point.y}</b>")) %>%
hc_title(text = "Worldwide suicides by year") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "Suicides per 100K people"),
allowDecimals = FALSE,
plotLines = list(list(
color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12",
style = list(color = "black", fontSize = 11))))) %>%
hc_legend(enabled = FALSE) %>%
hc_add_theme(custom_theme)
# Create tibble for sex so we can use it when creating our line plot.
sex_tibble <- data %>%
select(year, sex, suicides_no, population) %>%
group_by(year, sex) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Pick color for gender.
sex_color <- c("#EE6AA7", "#87CEEB") # baby blue & pink
# Create line plot.
highchart() %>%
hc_add_series(sex_tibble, hcaes(x = year, y = suicide_capita, group = sex), type = "line", color = sex_color) %>%
hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br>","Gender: <b>{point.sex}</b><br>", "Suicides: <b>{point.y}</b>")) %>%
hc_title(text = "Worldwide suicides by Gender") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "Suicides per 100K people"),
allowDecimals = FALSE,
plotLines = list(list(
color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12",
style = list(color = 'black', fontSize = 11))))) %>%
hc_add_theme(custom_theme)
# First, make a tibble of suicide by sex. We will use this for our pie chart.
pie_sex <- data %>%
select(sex, suicides_no, population) %>%
group_by(sex) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create pie chart for sex.
highchart() %>%
hc_add_series(pie_sex, hcaes(x = sex, y = suicide_capita,
color = sex_color), type = "pie") %>%
hc_tooltip(borderWidth = 1.5, headerFormat = "", pointFormat = paste("Gender: <b>{point.sex} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
hc_title(text = "<b>Worldwide suicides by Gender</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015", style = (list(fontSize = '10px'))) %>%
hc_plotOptions(pie = list(dataLabels = list(distance = 5,
style = list(fontSize = 10)),
size = 130)) %>%
hc_add_theme(custom_theme)
# Create tibble for age so we can use it when creating our line plot.
age_tibble <- data %>%
select(year, age, suicides_no, population) %>%
group_by(year, age) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Pick color for graph.
age_color <- rev(plasma(6))
# Create a line plot.
highchart() %>%
hc_add_series(age_tibble, hcaes(x = year, y = suicide_capita, group = age), type = "line", color = age_color) %>%
hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", pointFormat = paste("Year: <b>{point.x}</b> <br>","Age: <b>{point.age}</b><br>", "Suicides: <b>{point.y}</b>")) %>%
hc_title(text = "Worldwide suicides by Age") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(title = list(text = "Year")) %>%
hc_yAxis(title = list(text = "Suicides per 100K people"),
allowDecimals = FALSE,
plotLines = list(list(
color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12",
style = list(color = 'black', fontSize = 11))))) %>%
hc_add_theme(custom_theme)
# First, create a tibble of suicide by Age. We will use this for our pie chart.
pie_age <- data %>%
select(age, suicides_no, population) %>%
group_by(age) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
arrange(suicide_capita)
# Create pie chart for Age.
highchart() %>%
hc_add_series(pie_age, hcaes(x = age, y = suicide_capita,
color = age_color), type = "pie") %>%
hc_tooltip(borderWidth = 1.5, headerFormat = "", pointFormat = paste("Age: <b>{point.age} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
hc_title(text = "<b>Worldwide suicides by Age</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015", style = (list(fontSize = '10px'))) %>%
hc_plotOptions(pie = list(dataLabels = list(distance = 5,
style = list(fontSize = 10)),
size = 130)) %>%
hc_add_theme(custom_theme)
# Create new column in our data for continent. Use countrycode() to extract continents from country names.
data$continent <- countrycode(sourcevar = data$country,
origin = "country.name",
destination = "continent")
# Reclassify countries that have been coded as 'Americas', by countrycode(), into 'North America' and 'South America'.
south_america <- c('Argentina', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Guyana', 'Paraguay', 'Suriname', 'Uruguay')
data$continent[data$country %in% south_america] <- 'South America'
data$continent[data$continent=='Americas'] <- 'North America'
# Create a tibble for continent and sex.
continent_sex_tibble <- data %>%
select(continent, sex, suicides_no, population) %>%
group_by(continent, sex) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create histogram of suicides by continent.
highchart() %>%
hc_add_series(continent_sex_tibble, hcaes(x = continent, y = suicide_capita, group = sex), type = "column") %>%
hc_colors(colors = sex_color) %>%
hc_title(text = "Suicides by continent and <b>Gender</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, pointFormat = paste("Gender: <b> {point.sex} </b> <br> Suicides: <b>{point.y}</b>")) %>%
hc_xAxis(categories = c("Africa", "Asia", "Europe", "North <br> America", "Oceania", "South <br> America"), labels = list(style = list(fontSize = 8))) %>%
hc_yAxis(labels = list(style = list(fontSize = 10)),
title = list(text = "Suicides per 100K people",
style = list(fontSize = 10)),
plotLines = list(
list(color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12", style = list(color = "black", fontSize = 6))))) %>%
hc_legend(verticalAlign = 'top', enabled = FALSE) %>%
hc_add_theme(custom_theme)
# Create a tibble for continent and sex.
continent_age_tibble <- data %>%
select(continent, age, suicides_no, population) %>%
group_by(continent, age) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create histogram of suicides by continent.
highchart() %>%
hc_add_series(continent_age_tibble, hcaes(x = continent, y = suicide_capita, group = age), type = "column") %>%
hc_colors(colors = age_color) %>%
hc_title(text = "Suicides by continent and <b>Age</b>", style = (list(fontSize = '14px'))) %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, pointFormat = paste("Age: <b> {point.age} </b> <br> Suicides: <b>{point.y}</b>")) %>%
hc_xAxis(categories = c("Africa", "Asia", "Europe", "North <br> America", "Oceania", "South <br> America"), labels = list(style = list(fontSize = 8))) %>%
hc_yAxis(labels = list(style = list(fontSize = 10)),
title = list(text = "Suicides per 100K people",
style = list(fontSize = 10)),
plotLines = list(
list(color = "black", width = 1, dashStyle = "Dash",
value = mean(overall_tibble$suicide_capita),
label = list(text = "Mean = 13.12", style = list(color = "black", fontSize = 6))))) %>%
hc_legend(verticalAlign = 'top', enabled = FALSE) %>%
hc_add_theme(custom_theme)
For this part, I have imported the pre-downloaded world continent data from Highcharts map collection, but you can also download the map directly from the site with map_data <- download_map_data("custom/world-continents")
.
Here, we can compare suicide rate of each continent using color gradient to see which continent has the highest (or the lowest) suicide rate.
# Import continent map data.
map_data <- fromJSON(file = "world-continents.geo.json")
# Create a tibble for continent.
continent_tibble <- data %>%
select(continent, suicides_no, population) %>%
group_by(continent) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
arrange(suicide_capita)
# Create continent map with suicide data.
highchart() %>%
hc_add_series_map(map_data, continent_tibble, value = "suicide_capita", joinBy = c('name','continent'), name = "Suicides (per 100K people)") %>%
hc_add_series(continent_tibble, hcaes(x = continent, y = suicide_capita, color = suicide_capita), type = "pie", name = 'Suicides (per 100K people)') %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "Suicides by Continent") %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, valueSuffix = '') %>%
hc_plotOptions(
pie = list(center = c('10%', '80%'), size = 110, dataLabels = list(enabled = FALSE))) %>%
hc_add_theme(custom_theme)
# Create tibble for overall suicides by country
country_bar <- data %>%
select(country, suicides_no, population) %>%
group_by(country) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2)) %>%
arrange(desc(suicide_capita))
# Create interactive bar plot
highchart() %>%
hc_add_series(country_bar, hcaes(x = country, y = suicide_capita, color = suicide_capita), type = "bar") %>%
hc_tooltip(borderWidth = 1.5,
pointFormat = paste("Suicides: <b>{point.y}</b>")) %>%
hc_legend(enabled = FALSE) %>%
hc_title(text = "Suicides by country") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(categories = country_bar$country,
labels = list(step = 1),
min = 0, max = 25,
scrollbar = list(enabled = TRUE)) %>%
hc_yAxis(title = list(text = "Suicides per 100K people")) %>%
hc_plotOptions(bar = list(stacking = "normal",
pointPadding = 0, groupPadding = 0, borderWidth = 0.5)) %>%
hc_add_theme(custom_theme)
# Create tibble for suicide by countries and sex.
country_bar_sex <- data %>%
select(country, sex, suicides_no, population) %>%
group_by(country, sex) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
country_tibble <- data %>%
select(country, suicides_no, population) %>%
group_by(country) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create bar chart of suicide by sex.
highchart() %>%
hc_add_series(country_bar_sex, hcaes(x = country, y = suicide_capita, group = sex), type = "bar", color = sex_color) %>%
hc_tooltip(borderWidth = 1.5, pointFormat = paste("Gender: <b>{point.sex} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
hc_legend(enabled = TRUE, colorByPoint = TRUE) %>%
hc_title(text = "Suicides by country and gender") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(categories = country_tibble$country,
labels = list(step = 1),
min = 0, max = 25,
scrollbar = list(enabled = TRUE)) %>%
hc_yAxis(title = list(text = "Percentage of total suicides")) %>%
hc_plotOptions(bar = list(stacking = "percent",
pointPadding = 0, groupPadding = 0, borderWidth = 0.4)) %>%
hc_add_theme(custom_theme)
# Create tibble for suicide by countries and age
country_bar_age <- data %>%
select(country, age, suicides_no, population) %>%
group_by(country, age) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create interactive bar plot.
highchart() %>%
hc_add_series(country_bar_age, hcaes(x = country, y = suicide_capita, group = age), type = "bar", color = age_color) %>%
hc_tooltip(borderWidth = 1.5, pointFormat = paste("Age: <b>{point.age} ({point.percentage:.1f}%)</b> <br> Suicides per 100K: <b>{point.y}</b>")) %>%
hc_title(text = "Suicides by country and age") %>%
hc_subtitle(text = "1985-2015") %>%
hc_xAxis(categories = country_tibble$country,
labels = list(step = 1),
min = 0, max = 25,
scrollbar = list(enabled = TRUE)) %>%
hc_yAxis(title = list(text = "Percent of total suicides")) %>%
hc_plotOptions(bar = list(stacking = "percent",
pointPadding = 0, groupPadding = 0, borderWidth = 0.5)) %>%
hc_add_theme(custom_theme)
# Create a tibble with suicide per capita by country for 1985-2015.
country_tibble <- data %>%
select(country, suicides_no, population) %>%
group_by(country) %>%
summarize(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create interactive world map.
highchart() %>%
hc_add_series_map(worldgeojson, country_tibble, value = "suicide_capita", joinBy = c('name','country')) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "Suicides by Country") %>%
hc_subtitle(text = "1985-2015") %>%
hc_tooltip(borderWidth = 1.5, headerFormat = "", valueSuffix = " suicides (per 100K people)") %>%
hc_add_theme(custom_theme)
Now this is the fanciest (and the hardest part to compile). We can use crosstalk
package to link our data set across country filter, country data table, and line plot of suicide rate trend across years. Try typing country names that you are interested in in the search box, and the table will filter out suicide data for those countries for you to compare, as well as displaying a line plot for them as well.
You can use plotly
option to download the plot as png, compare data of the two countries on hover, and darg a box to zoom in at a specific period of time.
# Create tibble for our line plot.
country_year_tibble <- data %>%
select(country, year, suicides_no, population) %>%
group_by(country, year) %>%
summarise(suicide_capita = round((sum(suicides_no)/sum(population))*100000, 2))
# Create shared data that will be used to link filters, data table, and line plot.
shared_data <- SharedData$new(country_year_tibble)
# Create filter for year and country. These filters will adjust the DT datatable and PLOTLY plot.
filter_slider("year", "Year", shared_data, ~year, step = 1)
filter_select("country", "Country", shared_data, ~country, allLevels = TRUE, multiple = TRUE)
# Set a random seed. We will do this so that we can reproduce the random sample of colors we will use for our next graph.
set.seed(80085)
# Create line graph.
plot_ly(shared_data, x = ~year, y = ~suicide_capita,
color = ~country, colors = sample(colours(), 120),
type = 'scatter', mode = 'lines',
hoverinfo = 'text', text = ~paste("Country: ", country, '<br>Year: ', year, "<br>Suicides: ", suicide_capita)) %>%
layout(showlegend = FALSE,
title = "Suicide by country",
xaxis = list(title = "Year"),
yaxis = list(title = "Suicides per 100K people")) %>%
layout(plot_bgcolor = 'transparent') %>%
layout(paper_bgcolor = 'transparent') %>%
add_markers()
This entry is just me playing around with codes to generate interactive plots. While it may not be as flashy as machine learning or natural language processing, data visualization is an important part of data work to communicate results to our audience without relying to heavily on texts and numbers.
A little bit on graphing. When we make a plot, simplicity and efficiency are vital. We cannot just throw every last bit of information up there and expect the audience to understand everything as us, who were invested in that data for weeks, months, or even years. Use only the most relevant information, so that the main information doesn’t get overshadowed by bits and pieces such as sound effect, animation, or unnecessary numbers. Again, thank you very much for reading!
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Wongvorachan (2021, Dec. 9). Tarid Wongvorachan: Interactive plots for Suicide Data. Retrieved from https://taridwong.github.io/posts/2021-12-08-interactive-dashboard-for-suicide-data/
BibTeX citation
@misc{wongvorachan2021interactive, author = {Wongvorachan, Tarid}, title = {Tarid Wongvorachan: Interactive plots for Suicide Data}, url = {https://taridwong.github.io/posts/2021-12-08-interactive-dashboard-for-suicide-data/}, year = {2021} }