Reputation: 372
I am using the library ggplot2movies
for my data movies
Please keep in mind that I refer to mpaa rating and user rating, which are two different things. In case you don't want to load the ggplot2movies
library, here is a sample of the relevant data:
> head(subset(movies[,c(5,17)], movies$mpaa!=""))
# A tibble: 6 x 2
rating mpaa
<dbl> <chr>
1 5.3 R
2 7.1 PG-13
3 7.2 PG-13
4 4.9 R
5 4.8 PG-13
6 6.7 PG-13
Here I make a barplot that shows the frequency of films that have any mpaa rating:
ggplot(data=subset(movies, movies$mpaa!=""), aes(mpaa)) +
geom_bar()
Now I would like to color in the bars with a fill, based on the imdb user rating. I don't want to use factor(rating)
because there are an enormous number of different values in the rating column. However, when I try to use a continuous fill like in Assigning continuous fill color to geom_bar I get the same graph.
ggplot(data=subset(movies, movies$mpaa!=""), aes(mpaa, fill=rating)) +
geom_bar()+
scale_fill_continuous(low="blue", high="red")
I figure it has to do with the fact that my barplot is based on the frequency of a single variable, rather than a dataframe with a count column. I could make a new dataframe of the mpaa categories and their counts, but I'd rather know how to do this graph with the original movies
dataset and a single ggplot.
Edit: Using aes(mpaa, group = rating, fill = rating)
gives a chart that is almost correct, except that the bars and legend are swapped.
Upvotes: 3
Views: 5903
Reputation: 93821
You can reverse the legend with: + guides(fill=guide_colourbar(reverse=TRUE))
, however, a colour gradient doesn't seem very informative. Another option would be to cut rating
into discrete ranges, as in the example below, which provides a more clear indication of the distribution of ratings within each mpaa
category. Nevertheless, because of the different bar heights, it's not clear how the average rating or distribution of ratings varies by mpaa
category.
library(tidyverse)
library(ggplot2movies)
theme_set(theme_classic())
movies %>%
filter(mpaa != "") %>%
mutate(rating = fct_rev(cut(rating, seq(0,ceiling(max(rating)),2)))) %>%
ggplot(aes(mpaa, fill=rating)) +
geom_bar(colour="white", size=0.2) +
scale_fill_manual(values=c(hcl(240,100,c(30,70)), "yellow", hcl(0,100,c(70,30))))
Perhaps a boxplot or violin plot would be more informative. In the boxplot example below, the box widths are proportional to the square root of the number of movies rated, due to the varwidth=TRUE
argument (I'm not that wild about this because the square-root transformation is difficult to interpret, but I thought I'd put it out there as an option). In the violin plot, the area of each violin is proportional to the number of movies in each mpaa
category (due to the scale="count"
argument). I've also put the number of movies in each category in the x-axis label, and marked in blue the mean rating for each mpaa
category.
p = movies %>%
filter(mpaa != "") %>%
group_by(mpaa) %>%
mutate(xlab = paste0(mpaa, "\n(", format(n(), big.mark=","), ")")) %>%
ggplot(aes(xlab, rating)) +
labs(x="MPAA Rating\n(number of movies)",
y="Viewer Rating") +
scale_y_continuous(limits=c(0,10))
pl = list(geom_boxplot(varwidth=TRUE, colour="grey70"),
geom_violin(colour="grey70", scale="count",
draw_quantiles=c(0.25,0.5,0.75)),
stat_summary(fun.y=mean, geom="text", aes(label=sprintf("%1.1f", ..y..)),
colour="blue", size=3.5))
gridExtra::grid.arrange(p + pl[-2], p + pl[-1], ncol=2)
Upvotes: 2
Reputation: 76450
I am not sure that the following is what you want.
When coloring by rating
the default stat = "count"
is not working so I transform the data.
library(ggplot2movies)
library(dplyr)
data("movies")
subset(movies, mpaa != "") %>%
group_by(mpaa) %>%
summarise(rating = sum(rating)) %>%
ggplot(aes(x = mpaa, y = rating, fill = rating)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low="blue", high="red")
Upvotes: 0