Tamas
Tamas

Reputation: 786

Reformatting data frame to be able to plot bar charts with ggplot2

The questions refers to reformatting a data frame (df) to cope with displaying three bar charts on the same diagram with ggplo2(). Thank you for each response!

The data I have in df:

colA,    colB,    colC
label1,  label1,  label2
label3,  label1,  label3
label4,  label4,  label2
label5,  label4,  label5

With these data I can create bar charts for each column with the command below that presents the counts for each label in the given column.

  pl <- ggplot(df,aes(x=colA))
  pl1 <- pl + geom_bar() 
  pl1 <- pl1 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
  pl1 <- pl1 + xlab('Labels')+ ylab('Count')
  pl1 <- pl1 + ggtitle('Some Title') + theme(plot.title = element_text(hjust = 0.5))

  print(pl1)

However, I would like to depict the counts for all the three columns on the same bar chart not on there separate diagrams. I do not want to aggregate the counts for the three columns but depict the columns separately on the same diagram, maybe in groups for each label but I do not know whether grouping is the right choice in this case. The data format, I think, I need to create the desired chart:

Labels,  colA, colB, colC
label1,     1,    2,    0,
label2,     0,    0,    2,
label3,     1     0,    1,
label4,     1,    2,    0,
label5,     1,    0,    1,

Question 1: How can I reformat the data from the present format to the desired one?

Question 2: How can the data be presented on the same bar chart with the counts?

Upvotes: 1

Views: 49

Answers (2)

hpesoj626
hpesoj626

Reputation: 3629

For your desired format, you can easily do a tidyr::gather and reshape2::dcast combination.

library(tidyverse)
library(reshape2)
df %>%
  gather(column, label) %>%
  dcast(label ~ column, fun.aggregate = length, value.var = "column")

#    label colA colB colC
# 1 label1    1    2    0
# 2 label2    0    0    2
# 3 label3    1    0    1
# 4 label4    1    2    0
# 5 label5    1    0    1

This is the wide format, in R speak. When using ggplot2, it is actually a lot easier to use the long format.

df %>%
  gather(column, label) %>%
  group_by(column, label) %>%
  count()

#   column label      n
#   <chr>  <chr>  <int>
# 1 colA   label1     1
# 2 colA   label3     1
# 3 colA   label4     1
# 4 colA   label5     1
# 5 colB   label1     2
# 6 colB   label4     2
# 7 colC   label2     2
# 8 colC   label3     1
# 9 colC   label5     1

You can easily pass the result on to ggplot2 with

df %>%
  gather(column, label) %>%
  group_by(column, label) %>%
  count() %>%
  ggplot(aes(label, n)) + 
  geom_col() +
  facet_wrap(~column)

enter image description here


Data

df <- structure(list(colA = c("label1", "label3", "label4", "label5"
), colB = c("label1", "label1", "label4", "label4"), colC = c("label2", 
"label3", "label2", "label5")), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 1

Prem
Prem

Reputation: 11965

One of the approach could be to transform your data in long format using gather and then plot it

library(dplyr)
library(tidyr)
library(ggplot2)

df %>%
  gather(column_name, column_value) %>%
  group_by(column_value, column_name) %>%
  tally() %>%
  ggplot(aes(x = column_value, y = n, fill = column_name)) +
    geom_bar(stat = "identity") +
    xlab('Labels') + 
    ylab('Count')

wherein final data which is passed to ggplot is

#  column_value column_name     n
#1 label1       colA            1
#2 label1       colB            2
#3 label2       colC            2
#4 label3       colA            1
#5 label3       colC            1
#6 label4       colA            1
#7 label4       colB            2
#8 label5       colA            1
#9 label5       colC            1

Output plot: enter image description here

Sample data:

df <- structure(list(colA = c("label1", "label3", "label4", "label5"
), colB = c("label1", "label1", "label4", "label4"), colC = c("label2", 
"label3", "label2", "label5")), .Names = c("colA", "colB", "colC"
), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

Related Questions