user3710760
user3710760

Reputation: 567

Accummulating histogram from multiple columns in R

I have the following data

ID | Category (1-5) | Task1(in min) | Task2(in min) | Task3(in min)

I would like to create a histogram plot with the different Categories on the x-axis and accumulated duration of Tasks 1, 2, 3 (coloured correspondingly) on the y-axis.

Is this possible in R without having to change my raw data? It seems that ggplot only takes one column but not multiple ones.

Edit: My (rather poor) attempt was

library(ggplot2)
ggplot(dataset) + geom_col(aes(x=Category, y=Task1, fill=Task2))

I couldn't get my head around putting multiple columns in fill.

Here's the dput of the sample data

dataset <- structure(list(ID = c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), Category = c("5 - Expert", "2 - Novice", "3 - Intermediate", "5 - Expert", "2 - Novice", "3 - Intermediate", "3 - Intermediate", "3 - Intermediate", "2 - Novice", "3 - Intermediate", "2 - Novice", "4 - Advanced", "2 - Novice", "3 - Intermediate", "2 - Novice", "5 - Expert", "4 - Advanced", "2 - Novice", "2 - Novice", "3 - Intermediate"), Task1 = structure(c(300, 360, 240, 180, 180, 240, 240, 360, 300, 300, 180, 360, 240, 240, 240, 300, 240, 240, 240, 240), class = c("hms", "difftime"), units = "secs"), Task2 = structure(c(480, 360, 660, 420, 660, 240, 660, 540, 780, 360, 540, 720, 360, 480, 540, 300, 420, 600, 240, 660), class = c("hms", "difftime"), units = "secs"), Task3 = structure(c(360, 480, 240, 300, 240, 240, 240, 240, 240, 180, 240, 180, 120, 120, 240, 240, 240, 240, 300, 240), class = c("hms", "difftime"), units = "secs")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 4088

Answers (2)

tjebo
tjebo

Reputation: 23737

You were very close. Make your data long. Here a solution using ggplot.

library(tidyverse)
dataset <- structure(list(ID = c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), Category = c("5 - Expert", "2 - Novice", "3 - Intermediate", "5 - Expert", "2 - Novice", "3 - Intermediate", "3 - Intermediate", "3 - Intermediate", "2 - Novice", "3 - Intermediate", "2 - Novice", "4 - Advanced", "2 - Novice", "3 - Intermediate", "2 - Novice", "5 - Expert", "4 - Advanced", "2 - Novice", "2 - Novice", "3 - Intermediate"), Task1 = structure(c(300, 360, 240, 180, 180, 240, 240, 360, 300, 300, 180, 360, 240, 240, 240, 300, 240, 240, 240, 240), class = c("hms", "difftime"), units = "secs"), Task2 = structure(c(480, 360, 660, 420, 660, 240, 660, 540, 780, 360, 540, 720, 360, 480, 540, 300, 420, 600, 240, 660), class = c("hms", "difftime"), units = "secs"), Task3 = structure(c(360, 480, 240, 300, 240, 240, 240, 240, 240, 180, 240, 180, 120, 120, 240, 240, 240, 240, 300, 240), class = c("hms", "difftime"), units = "secs")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

dataset_long <- dataset %>% gather(task, value, Task1:Task3)

ggplot(dataset_long) + geom_col(aes(x = Category, y = value, fill = task))

Created on 2018-12-18 by the reprex package (v0.2.1)

I hope this comes close to your desired output. It does not require changing your raw data, but working with R requires a bit of flexibility to shape your data. I would guess that wrangling your data to the correct form/shape is about 95% of the work needed for your analysis / visualistation tasks in R.

Upvotes: 2

jon
jon

Reputation: 370

I don't think you want a histogram. Histograms are frequency distributions that have counts on the y-axis and some continuous variable on the x-axis. Thus you are really only plotting a single variable.

To get the category on the x-axis and cumulative time on the y-axis, you want to use geom_bar(). Since each category is it's own bar on the x-axis, you don't need to color them separately, but I did so using the fill=Category argument in the aes() wrapper in the ggplot() function just to illustrate.

Example dataframe:

df <- data.frame(Category = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"),
                 Task1 = rnorm(5,7,0.5),
                 Task2 = rnorm(5,8,0.5),
                 Task3 = rnorm(5,9,0.5))

Example solution:

df %>%
  mutate(TaskTime = Task1 + Task2 + Task3) %>% # Creating cumulative time
  ggplot(aes(x = Category, y = TaskTime, fill = Category))+ # Passing plot arguments
  geom_bar(stat="Identity") # Specifying the type of plot

Upvotes: -1

Related Questions