hoze
hoze

Reputation: 11

Which ggplot2 geom should I use?

I have a data frame.

id <- c(1:5)
count_big <- c(15, 25, 7, 0, 12)
count_small <- c(15, 9, 22, 11, 14)
count_black <- c(7, 12, 5, 2, 6)
count_yellow <- c(2, 0, 7, 4, 3)
count_red <- c(8, 4, 4, 2, 5)
count_blue <- c(5, 9, 6, 1, 7)
count_green <- c(8, 9, 7, 2, 5)
df <- data.frame(id, count_big, count_small, count_black, count_yellow, count_red, count_blue, count_green)

How can I display the following in ggplot2 and which geom should I use:

This is just a subset of the data set that has around 1000 rows.

Can I use this df in ggplot2, or do I need to transform it into tidy data with tidyr? (don't know data.table yet)

Upvotes: 1

Views: 65

Answers (1)

Jordi
Jordi

Reputation: 1343

You need to first restructure the data from wide to long with tidyr.

library(tidyr)
library(ggplot2)
df <- gather(df, var, value, starts_with("count"))

# remove count_
df$var <- sub("count_", "", df$var)

# plot big vs small
df_size <- subset(df, var %in% c("big", "small"))
ggplot(df_size, aes(x = id, y = value, fill = var)) +
  geom_bar(stat = "identity", position = position_dodge())

# same routine for colors
df_color <- subset(df, !(var %in% c("big", "small")))
ggplot(df_color, aes(x = id, y = value, fill = var)) +
  geom_bar(stat = "identity", position = position_dodge())

Use stat = "identity" to prevent it from doing a row count. position = position_dodge() is used to place the bars next to each other rather than stacked.

Upvotes: 1

Related Questions