Reputation: 786
The questions refers to reformatting a data frame (df) to cope with displaying three bar charts on the same diagram with ggplo2(). Thank you for each response!
The data I have in df:
colA, colB, colC
label1, label1, label2
label3, label1, label3
label4, label4, label2
label5, label4, label5
With these data I can create bar charts for each column with the command below that presents the counts for each label in the given column.
pl <- ggplot(df,aes(x=colA))
pl1 <- pl + geom_bar()
pl1 <- pl1 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
pl1 <- pl1 + xlab('Labels')+ ylab('Count')
pl1 <- pl1 + ggtitle('Some Title') + theme(plot.title = element_text(hjust = 0.5))
print(pl1)
However, I would like to depict the counts for all the three columns on the same bar chart not on there separate diagrams. I do not want to aggregate the counts for the three columns but depict the columns separately on the same diagram, maybe in groups for each label but I do not know whether grouping is the right choice in this case. The data format, I think, I need to create the desired chart:
Labels, colA, colB, colC
label1, 1, 2, 0,
label2, 0, 0, 2,
label3, 1 0, 1,
label4, 1, 2, 0,
label5, 1, 0, 1,
Question 1: How can I reformat the data from the present format to the desired one?
Question 2: How can the data be presented on the same bar chart with the counts?
Upvotes: 1
Views: 49
Reputation: 3629
For your desired format, you can easily do a tidyr::gather
and reshape2::dcast
combination.
library(tidyverse)
library(reshape2)
df %>%
gather(column, label) %>%
dcast(label ~ column, fun.aggregate = length, value.var = "column")
# label colA colB colC
# 1 label1 1 2 0
# 2 label2 0 0 2
# 3 label3 1 0 1
# 4 label4 1 2 0
# 5 label5 1 0 1
This is the wide format, in R speak. When using ggplot2
, it is actually a lot easier to use the long
format.
df %>%
gather(column, label) %>%
group_by(column, label) %>%
count()
# column label n
# <chr> <chr> <int>
# 1 colA label1 1
# 2 colA label3 1
# 3 colA label4 1
# 4 colA label5 1
# 5 colB label1 2
# 6 colB label4 2
# 7 colC label2 2
# 8 colC label3 1
# 9 colC label5 1
You can easily pass the result on to ggplot2
with
df %>%
gather(column, label) %>%
group_by(column, label) %>%
count() %>%
ggplot(aes(label, n)) +
geom_col() +
facet_wrap(~column)
Data
df <- structure(list(colA = c("label1", "label3", "label4", "label5"
), colB = c("label1", "label1", "label4", "label4"), colC = c("label2",
"label3", "label2", "label5")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 11965
One of the approach could be to transform your data in long format using gather
and then plot it
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
gather(column_name, column_value) %>%
group_by(column_value, column_name) %>%
tally() %>%
ggplot(aes(x = column_value, y = n, fill = column_name)) +
geom_bar(stat = "identity") +
xlab('Labels') +
ylab('Count')
wherein final data which is passed to ggplot
is
# column_value column_name n
#1 label1 colA 1
#2 label1 colB 2
#3 label2 colC 2
#4 label3 colA 1
#5 label3 colC 1
#6 label4 colA 1
#7 label4 colB 2
#8 label5 colA 1
#9 label5 colC 1
Sample data:
df <- structure(list(colA = c("label1", "label3", "label4", "label5"
), colB = c("label1", "label1", "label4", "label4"), colC = c("label2",
"label3", "label2", "label5")), .Names = c("colA", "colB", "colC"
), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1