dnsko
dnsko

Reputation: 1047

R ggplot scatterplot color multiple columns

I am attempting to create a scatterplot with ggplot, using multiple fields. I have read about these scatterplots, and coloring for a field, but was wondering how I would do this for the ggplot2movies dataset? I wanted to color based on the genre, but these genres are all split up:

> movies <- ggplot2movies::movies
> head(movies)
            title  year length budget rating votes    r1    r2    r3    r4    r5    r6    r7    r8    r9   r10  mpaa Action Animation Comedy Drama Documentary Romance Short
                     <chr> <int>  <dbl>  <int>  <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  <int>     <int>  <int> <int>       <int>   <int> <int>
1                        $  1971    121     NA    6.4   348   4.5   4.5   4.5   4.5  14.5  24.5  24.5  14.5   4.5   4.5            0         0      1     1           0       0     0
2        $1000 a Touchdown  1939     71     NA    6.0    20   0.0  14.5   4.5  24.5  14.5  14.5  14.5   4.5   4.5  14.5            0         0      1     0           0       0     0
3   $21 a Day Once a Month  1941      7     NA    8.2     5   0.0   0.0   0.0   0.0   0.0  24.5   0.0  44.5  24.5  24.5            0         1      0     0           0       0     1
4                  $40,000  1996     70     NA    8.2     6  14.5   0.0   0.0   0.0   0.0   0.0   0.0   0.0  34.5  45.5            0         0      1     0           0       0     0
5 $50,000 Climax Show, The  1975     71     NA    3.4    17  24.5   4.5   0.0  14.5  14.5   4.5   0.0   0.0   0.0  24.5            0         0      0     0           0       0     0
6                    $pent  2000     91     NA    4.3    45   4.5   4.5   4.5  14.5  14.5  14.5   4.5   4.5  14.5  14.5            0         0      0     1           0       0     0

What is the best way to approach this (color based on genre)? All help is really appreciated!

Upvotes: 0

Views: 2976

Answers (1)

Jake Kaupp
Jake Kaupp

Reputation: 8072

As @hrbrmstr states, you need to transform the data from wide to long. You can use tidyr::gather() in conjunction with dplyr::filter() to achieve this. This chain:

  1. gathers the names and values from Action to Short into the columns genre and flag. This moves the many columns (wide) into a key-value pair (long).
  2. Uses filter to remove the superfluous values for genre (those where the flag == 0).
  3. Stores the resultant data frame in plot_data

The remaining code is a simple ggplot2 scatterplot of length vs rating.

library(dplyr)
library(tidyr)
library(ggplot2)
library(ggplot2movies)

plot_data <- movies %>% 
  gather(genre, flag, Action:Short) %>% 
  filter(flag != 0)

ggplot(plot_data, aes(x = rating, y = length)) +
  geom_point(aes(color = genre), alpha = 0.4)

enter image description here

Upvotes: 3

Related Questions