StorybyData
StorybyData

Reputation: 3

Bar Chart in R - I can do this in Python and Tableau but having trouble in R

Trying to build a simple bar chart in R.

This is the link to the data (https://data.world/makeovermonday/2020w3-is-it-time-to-treat-sugar-like-smoking). I need to build a simple bar chart that shows sugar intake only for children (there are 3 rows with children), only for this specific column "(2014/15-2015/16)". I know it has something to do with select() and filter() but am having trouble - appreciate any help!

Attaching what I did in Python and Tableau. Trying to replicate in R: Image

Upvotes: 0

Views: 384

Answers (6)

user10598279
user10598279

Reputation: 1

In R, select() is used to extract columns and filter() to extract rows. I have renamed column1 as 'sugar'. Below is the code, hope this helps!

# Libraries used
library(ggplot2)
library(readr)
library(dplyr)
library(tidyverse)

# Loading data
df = read_csv('2020W3.csv')

# Renaming first column to sugar
df = df %>% rename(sugar = free_sugars_intake_of_total_energy_in_all_age_groups_for_all_paired_years_of_the_ndns_rolling_programme)

#Filtering rows
df1 = df %>% filter(str_starts(sugar, 'Child')) 

#plotting barplot
ggplot(df1, aes(df1$sugar, df1$`2014_15_2015_16`)) + geom_bar(stat = "identity") + 
  ggtitle('Children total intake as % of total energey') + labs(x = 'Age Bracket', y = '% of sugar intake') 

Snapshot of code

Upvotes: 0

toneloy
toneloy

Reputation: 133

A key part of ggplot2 is that data must be tidy for it to work properly. This can be a bit of a hassle sometimes, but it usually pays off.

This is my full solution, working hard on getting the data in tidy format and then the ggplot2 is a lot easier:

library(dplyr)
library(ggplot2)
library(readxl)
library(tidyr)

sugar <- read_excel("data/MakeoverMondayData.xlsx")
children_2014_2016 <- sugar %>% 
  gather("period", "intake", -1) %>% 
  separate(1, c("category", "age"), sep = " ", extra = "merge") %>% 
  filter(
    category == "Children",
    period == "(2014/15-2015/16)"
  ) %>% 
  mutate(age = factor(age, levels = c("1.5-3 years", "4-10 years", "11-18 years"), ordered = TRUE))

label_ <- data.frame(x = 2, y = 5, label = "5% of total energy")

children_2014_2016 %>% 
  ggplot() + 
  geom_bar(stat = "identity", fill = "lightblue", aes(x = age, y = intake)) + 
  geom_hline(yintercept = 5, linetype = "dashed", colour = "grey") +
  geom_text(data = label_, aes(x = x, label = label, y = y)) +
  ggtitle("Children's free sugars intake (as % of of total energy)") +
  labs(x = "Age", y = "Free sugars as % of of total energy") +
  theme_minimal()

enter image description here

Now I'll try to explain how does it work:

  1. The first step would be to make data tidy. For that, I'm going to tidyr::gather the columns to have two new columns, period and intake. The -1 means that I'm gathering all but the first column.
gather("period", "intake", -1)
  1. Separate the first column so I can have better control over the filtering in the next step. I'm separating the first column into two new columns, category (Children, Adult, etc.) and age. The extra = "merge" argument is there because there would be more than two columns when separating with a whitespace, so I want to merge the extra stuff in the last column.
separate(1, c("category", "age"), sep = " ", extra = "merge")
  1. Filter by category and period. This is fairly straight forward
filter(
  category == "Children",
  period == "(2014/15-2015/16)"
) %>% 
  1. Mutate the age column to be an ordered factor, so I can control the order in which the categories appear in the plot
mutate(age = factor(age, levels = c("1.5-3 years", "4-10 years", "11-18 years"), ordered = TRUE))

After this, everything but the label "5% of total energy" is pretty standard ggplot2, I think.

Upvotes: 0

Zaid H
Zaid H

Reputation: 1

#Load the xlsx library to read the excel sheet
library(xlsx)
#load the tidyverse library for ggplot and others (one line of code for many handful libraries)
library(tidyverse)

#Read the excel sheet, and specify sheet #1 
#Reading strings as Factors is opetional; thus commented it out and my code converts to strins
data.chld.sugar <- read.xlsx("2020W3.xlsx", 1)#, stringsAsFactors = F)

#Check the data; dimension and header
dim(data.chld.sugar)
head(data.chld.sugar)


#Let's clean up the column names or names of the data frame. Will make potting easier
colnames(data.chld.sugar) <- c("Age_group", "Y2008_2009", "Y2010_2011", "Y2012_2013", "Y2014_2015")



#keep only the Children groups by perfrming the filter on the Age-group.
data.chld.sugar.1 <- data.chld.sugar %>%
  filter(Age_group %in% c("Children 1.5-3 years", "Children 4-10 years", "Children 11-18 years"))

#Make sure Age_group is a factor and re-level to order the Children's group in the correct order
data.chld.sugar.1$Age_group <- factor(data.chld.sugar.1$Age_group, levels=c("Children 1.5-3 years", "Children 4-10 years", "Children 11-18 years"))


#Using ggplot, will create the Bar plot and will try to make it very close to the Python output
ggplot(data = data.chld.sugar.1, 
         mapping = aes(x = Age_group,
                       y = Y2014_2015))+
  geom_bar(stat="identity",width=.6)+ #reduce th width of the barplot to mimic the Python plot output
  scale_y_continuous(name="% of sugar intake \n", breaks=c(0,2,4,6,8,10,12,14))+#using the same breaks
  xlab("Age bracket")+
  ggtitle("Children's sugar intake as a % of total energy")

Upvotes: 0

dc37
dc37

Reputation: 16178

Without to use dplyr package and only ggplot2, you can use subset to select a part of your dataframe and scale_x_discrete to order your x-axis. you can also use geom_col instead of using geom_bar(stat = "identity"):

library(ggplot2)
colnames(df)[1] = "Age Bracket"
ggplot(data = subset(df, grepl("Children",`Age Bracket`)), 
       aes(x = `Age Bracket`, y = `(2014/15-2015/16)`))+
  geom_col(fill = "#b2dbf4",width = 0.8 )+
  geom_hline(yintercept = 5, colour = "gray", lty = 2) +
  scale_x_discrete(limits = c("Children 1.5-3 years","Children 4-10 years","Children 11-18 years"))+
  labs(y = "Free sugars as % of total energy", 
       title = "Children's sugar intake as a % of total energy\n(2014/15-2015/16)") 

enter image description here

Upvotes: 1

mayrop
mayrop

Reputation: 701

You can do the following:

library(ggplot2)
library(dplyr)
library(httr)
library(readxl)

GET("https://query.data.world/s/wxcskq64mo3kn4zga2fjpm2aaucmxk", write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(tf)
names(df) <- c("age_bracket", "years_08_09", "years_10_11", "years_12_13", "years_14_15")

df$age_bracket <- factor(df$age_bracket, levels = df$age_bracket, ordered = TRUE)

ggplot(
  data = df %>% filter(grepl("Children", age_bracket)),
    aes(
      x = age_bracket,
      y = years_08_09
    )
  ) +
  geom_bar(stat = "identity", fill = "#CFE5F3") +
  geom_hline(yintercept = 5, colour = "gray", lty = 2) +
  labs(
    x = "Age Bracket", 
    y = "Free sugars as % of total energy", 
    title = "Children's sugar intake as a % of total energy"
  ) +
  theme_bw()

Upvotes: 1

PatientSnake
PatientSnake

Reputation: 40

Here is a very ugly graph, but should give you something to start with?

library(ggplot2)
library(dplyr)

data <- read.csv("C:/2020W3.csv")

names(data) <- c("AgeGroup", "2008-2009", "2010-2011", "2012-2013", "2014-2015")

data$AgeGroup <- as.factor(data$AgeGroup)
ggplot(
  data = data %>% select(AgeGroup, `2008-2009`),
  aes(
    x = AgeGroup,
    y = `2008-2009`
  )
) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 5)

Happy to help further if needed.

Upvotes: 1

Related Questions