Jared C
Jared C

Reputation: 372

How to make a continuous fill in a ggplot2 bar plot with one variable

I am using the library ggplot2movies for my data movies

Please keep in mind that I refer to mpaa rating and user rating, which are two different things. In case you don't want to load the ggplot2movies library, here is a sample of the relevant data:

> head(subset(movies[,c(5,17)], movies$mpaa!=""))
# A tibble: 6 x 2
  rating mpaa 
   <dbl> <chr>
1    5.3 R    
2    7.1 PG-13
3    7.2 PG-13
4    4.9 R    
5    4.8 PG-13
6    6.7 PG-13

Here I make a barplot that shows the frequency of films that have any mpaa rating:

ggplot(data=subset(movies, movies$mpaa!=""), aes(mpaa)) +
  geom_bar()

enter image description here

Now I would like to color in the bars with a fill, based on the imdb user rating. I don't want to use factor(rating) because there are an enormous number of different values in the rating column. However, when I try to use a continuous fill like in Assigning continuous fill color to geom_bar I get the same graph.

ggplot(data=subset(movies, movies$mpaa!=""), aes(mpaa, fill=rating)) +
  geom_bar()+ 
  scale_fill_continuous(low="blue", high="red")

I figure it has to do with the fact that my barplot is based on the frequency of a single variable, rather than a dataframe with a count column. I could make a new dataframe of the mpaa categories and their counts, but I'd rather know how to do this graph with the original movies dataset and a single ggplot.

Edit: Using aes(mpaa, group = rating, fill = rating) gives a chart that is almost correct, except that the bars and legend are swapped. enter image description here

Upvotes: 3

Views: 5903

Answers (2)

eipi10
eipi10

Reputation: 93821

You can reverse the legend with: + guides(fill=guide_colourbar(reverse=TRUE)), however, a colour gradient doesn't seem very informative. Another option would be to cut rating into discrete ranges, as in the example below, which provides a more clear indication of the distribution of ratings within each mpaa category. Nevertheless, because of the different bar heights, it's not clear how the average rating or distribution of ratings varies by mpaa category.

library(tidyverse)
library(ggplot2movies)
theme_set(theme_classic())

movies %>% 
  filter(mpaa != "") %>% 
  mutate(rating = fct_rev(cut(rating, seq(0,ceiling(max(rating)),2)))) %>% 
  ggplot(aes(mpaa, fill=rating)) +
    geom_bar(colour="white", size=0.2) + 
    scale_fill_manual(values=c(hcl(240,100,c(30,70)), "yellow", hcl(0,100,c(70,30))))

enter image description here

Perhaps a boxplot or violin plot would be more informative. In the boxplot example below, the box widths are proportional to the square root of the number of movies rated, due to the varwidth=TRUE argument (I'm not that wild about this because the square-root transformation is difficult to interpret, but I thought I'd put it out there as an option). In the violin plot, the area of each violin is proportional to the number of movies in each mpaa category (due to the scale="count" argument). I've also put the number of movies in each category in the x-axis label, and marked in blue the mean rating for each mpaa category.

p = movies %>% 
  filter(mpaa != "") %>% 
  group_by(mpaa) %>% 
  mutate(xlab = paste0(mpaa, "\n(", format(n(), big.mark=","), ")")) %>% 
  ggplot(aes(xlab, rating)) +
    labs(x="MPAA Rating\n(number of movies)", 
         y="Viewer Rating") +
    scale_y_continuous(limits=c(0,10))

pl = list(geom_boxplot(varwidth=TRUE, colour="grey70"),
          geom_violin(colour="grey70", scale="count",
                      draw_quantiles=c(0.25,0.5,0.75)),
          stat_summary(fun.y=mean, geom="text", aes(label=sprintf("%1.1f", ..y..)), 
                         colour="blue", size=3.5))  

gridExtra::grid.arrange(p + pl[-2], p + pl[-1], ncol=2)

enter image description here

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76450

I am not sure that the following is what you want.
When coloring by rating the default stat = "count" is not working so I transform the data.

library(ggplot2movies)
library(dplyr)

data("movies")

subset(movies, mpaa != "") %>%
  group_by(mpaa) %>%
  summarise(rating = sum(rating)) %>%
  ggplot(aes(x = mpaa, y = rating, fill = rating)) +
  geom_bar(stat = "identity") +
  scale_fill_continuous(low="blue", high="red")

enter image description here

Upvotes: 0

Related Questions