Reputation: 3
Trying to build a simple bar chart in R.
This is the link to the data (https://data.world/makeovermonday/2020w3-is-it-time-to-treat-sugar-like-smoking). I need to build a simple bar chart that shows sugar intake only for children (there are 3 rows with children), only for this specific column "(2014/15-2015/16)". I know it has something to do with select()
and filter()
but am having trouble - appreciate any help!
Attaching what I did in Python and Tableau. Trying to replicate in R: Image
Upvotes: 0
Views: 384
Reputation: 1
In R, select() is used to extract columns and filter() to extract rows. I have renamed column1 as 'sugar'. Below is the code, hope this helps!
# Libraries used
library(ggplot2)
library(readr)
library(dplyr)
library(tidyverse)
# Loading data
df = read_csv('2020W3.csv')
# Renaming first column to sugar
df = df %>% rename(sugar = free_sugars_intake_of_total_energy_in_all_age_groups_for_all_paired_years_of_the_ndns_rolling_programme)
#Filtering rows
df1 = df %>% filter(str_starts(sugar, 'Child'))
#plotting barplot
ggplot(df1, aes(df1$sugar, df1$`2014_15_2015_16`)) + geom_bar(stat = "identity") +
ggtitle('Children total intake as % of total energey') + labs(x = 'Age Bracket', y = '% of sugar intake')
Upvotes: 0
Reputation: 133
A key part of ggplot2
is that data must be tidy for it to work properly. This can be a bit of a hassle sometimes, but it usually pays off.
This is my full solution, working hard on getting the data in tidy format and then the ggplot2
is a lot easier:
library(dplyr)
library(ggplot2)
library(readxl)
library(tidyr)
sugar <- read_excel("data/MakeoverMondayData.xlsx")
children_2014_2016 <- sugar %>%
gather("period", "intake", -1) %>%
separate(1, c("category", "age"), sep = " ", extra = "merge") %>%
filter(
category == "Children",
period == "(2014/15-2015/16)"
) %>%
mutate(age = factor(age, levels = c("1.5-3 years", "4-10 years", "11-18 years"), ordered = TRUE))
label_ <- data.frame(x = 2, y = 5, label = "5% of total energy")
children_2014_2016 %>%
ggplot() +
geom_bar(stat = "identity", fill = "lightblue", aes(x = age, y = intake)) +
geom_hline(yintercept = 5, linetype = "dashed", colour = "grey") +
geom_text(data = label_, aes(x = x, label = label, y = y)) +
ggtitle("Children's free sugars intake (as % of of total energy)") +
labs(x = "Age", y = "Free sugars as % of of total energy") +
theme_minimal()
Now I'll try to explain how does it work:
tidyr::gather
the columns to have two new columns, period
and intake
. The -1
means that I'm gathering all but the first column.gather("period", "intake", -1)
category
(Children, Adult, etc.) and age
. The extra = "merge"
argument is there because there would be more than two columns when separating with a whitespace, so I want to merge the extra stuff in the last column.separate(1, c("category", "age"), sep = " ", extra = "merge")
filter(
category == "Children",
period == "(2014/15-2015/16)"
) %>%
age
column to be an ordered factor, so I can control the order in which the categories appear in the plotmutate(age = factor(age, levels = c("1.5-3 years", "4-10 years", "11-18 years"), ordered = TRUE))
After this, everything but the label "5% of total energy" is pretty standard ggplot2
, I think.
Upvotes: 0
Reputation: 1
#Load the xlsx library to read the excel sheet
library(xlsx)
#load the tidyverse library for ggplot and others (one line of code for many handful libraries)
library(tidyverse)
#Read the excel sheet, and specify sheet #1
#Reading strings as Factors is opetional; thus commented it out and my code converts to strins
data.chld.sugar <- read.xlsx("2020W3.xlsx", 1)#, stringsAsFactors = F)
#Check the data; dimension and header
dim(data.chld.sugar)
head(data.chld.sugar)
#Let's clean up the column names or names of the data frame. Will make potting easier
colnames(data.chld.sugar) <- c("Age_group", "Y2008_2009", "Y2010_2011", "Y2012_2013", "Y2014_2015")
#keep only the Children groups by perfrming the filter on the Age-group.
data.chld.sugar.1 <- data.chld.sugar %>%
filter(Age_group %in% c("Children 1.5-3 years", "Children 4-10 years", "Children 11-18 years"))
#Make sure Age_group is a factor and re-level to order the Children's group in the correct order
data.chld.sugar.1$Age_group <- factor(data.chld.sugar.1$Age_group, levels=c("Children 1.5-3 years", "Children 4-10 years", "Children 11-18 years"))
#Using ggplot, will create the Bar plot and will try to make it very close to the Python output
ggplot(data = data.chld.sugar.1,
mapping = aes(x = Age_group,
y = Y2014_2015))+
geom_bar(stat="identity",width=.6)+ #reduce th width of the barplot to mimic the Python plot output
scale_y_continuous(name="% of sugar intake \n", breaks=c(0,2,4,6,8,10,12,14))+#using the same breaks
xlab("Age bracket")+
ggtitle("Children's sugar intake as a % of total energy")
Upvotes: 0
Reputation: 16178
Without to use dplyr
package and only ggplot2
, you can use subset
to select a part of your dataframe and scale_x_discrete
to order your x-axis. you can also use geom_col
instead of using geom_bar(stat = "identity")
:
library(ggplot2)
colnames(df)[1] = "Age Bracket"
ggplot(data = subset(df, grepl("Children",`Age Bracket`)),
aes(x = `Age Bracket`, y = `(2014/15-2015/16)`))+
geom_col(fill = "#b2dbf4",width = 0.8 )+
geom_hline(yintercept = 5, colour = "gray", lty = 2) +
scale_x_discrete(limits = c("Children 1.5-3 years","Children 4-10 years","Children 11-18 years"))+
labs(y = "Free sugars as % of total energy",
title = "Children's sugar intake as a % of total energy\n(2014/15-2015/16)")
Upvotes: 1
Reputation: 701
You can do the following:
library(ggplot2)
library(dplyr)
library(httr)
library(readxl)
GET("https://query.data.world/s/wxcskq64mo3kn4zga2fjpm2aaucmxk", write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(tf)
names(df) <- c("age_bracket", "years_08_09", "years_10_11", "years_12_13", "years_14_15")
df$age_bracket <- factor(df$age_bracket, levels = df$age_bracket, ordered = TRUE)
ggplot(
data = df %>% filter(grepl("Children", age_bracket)),
aes(
x = age_bracket,
y = years_08_09
)
) +
geom_bar(stat = "identity", fill = "#CFE5F3") +
geom_hline(yintercept = 5, colour = "gray", lty = 2) +
labs(
x = "Age Bracket",
y = "Free sugars as % of total energy",
title = "Children's sugar intake as a % of total energy"
) +
theme_bw()
Upvotes: 1
Reputation: 40
Here is a very ugly graph, but should give you something to start with?
library(ggplot2)
library(dplyr)
data <- read.csv("C:/2020W3.csv")
names(data) <- c("AgeGroup", "2008-2009", "2010-2011", "2012-2013", "2014-2015")
data$AgeGroup <- as.factor(data$AgeGroup)
ggplot(
data = data %>% select(AgeGroup, `2008-2009`),
aes(
x = AgeGroup,
y = `2008-2009`
)
) +
geom_bar(stat = "identity") +
geom_hline(yintercept = 5)
Happy to help further if needed.
Upvotes: 1