Reputation: 1047
I am attempting to create a scatterplot with ggplot
, using multiple fields. I have read about these scatterplots, and coloring for a field, but was wondering how I would do this for the ggplot2movies
dataset? I wanted to color based on the genre, but these genres are all split up:
> movies <- ggplot2movies::movies
> head(movies)
title year length budget rating votes r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa Action Animation Comedy Drama Documentary Romance Short
<chr> <int> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <int> <int> <int> <int> <int> <int> <int>
1 $ 1971 121 NA 6.4 348 4.5 4.5 4.5 4.5 14.5 24.5 24.5 14.5 4.5 4.5 0 0 1 1 0 0 0
2 $1000 a Touchdown 1939 71 NA 6.0 20 0.0 14.5 4.5 24.5 14.5 14.5 14.5 4.5 4.5 14.5 0 0 1 0 0 0 0
3 $21 a Day Once a Month 1941 7 NA 8.2 5 0.0 0.0 0.0 0.0 0.0 24.5 0.0 44.5 24.5 24.5 0 1 0 0 0 0 1
4 $40,000 1996 70 NA 8.2 6 14.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 34.5 45.5 0 0 1 0 0 0 0
5 $50,000 Climax Show, The 1975 71 NA 3.4 17 24.5 4.5 0.0 14.5 14.5 4.5 0.0 0.0 0.0 24.5 0 0 0 0 0 0 0
6 $pent 2000 91 NA 4.3 45 4.5 4.5 4.5 14.5 14.5 14.5 4.5 4.5 14.5 14.5 0 0 0 1 0 0 0
What is the best way to approach this (color based on genre)? All help is really appreciated!
Upvotes: 0
Views: 2976
Reputation: 8072
As @hrbrmstr states, you need to transform the data from wide to long. You can use tidyr::gather()
in conjunction with dplyr::filter()
to achieve this. This chain:
genre
and flag
. This moves the many columns (wide) into a key-value pair (long).genre
(those where the flag == 0).plot_data
The remaining code is a simple ggplot2
scatterplot of length
vs rating
.
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggplot2movies)
plot_data <- movies %>%
gather(genre, flag, Action:Short) %>%
filter(flag != 0)
ggplot(plot_data, aes(x = rating, y = length)) +
geom_point(aes(color = genre), alpha = 0.4)
Upvotes: 3