Reputation: 35
I am struggling with the tidyverse
package. I'm using the mpg
dataset from R to display the issue that I'm facing (ignore if the relationships are not relevant, it is just for the sake of explaining my problem).
What I'm trying to do is to obtain the average "displ" grouped by manufacturer and year AND at the same time (and this is what I can't figure out), have several columns for each of the fuel types variable (i.e.: a column for the mean of diesel, a column for the mean of petrol, etc.).
This is the first part of the code and I'm new to R so I really don't know what do I need to add...
mpg %>%
group_by(manufacturer, year) %>%
summarize(Mean. = mean(c(displ)))
# A tibble: 30 × 3
# Groups: manufacturer [15]
manufacturer year Mean.
<chr> <int> <dbl>
1 audi 1999 2.36
2 audi 2008 2.73
3 chevrolet 1999 4.97
4 chevrolet 2008 5.12
5 dodge 1999 4.32
6 dodge 2008 4.42
7 ford 1999 4.45
8 ford 2008 4.66
9 honda 1999 1.6
10 honda 2008 1.85
# … with 20 more rows
Any help is appreciated, thank you.
Upvotes: 0
Views: 153
Reputation: 886938
Perhaps, we need to reshape into 'wide'
library(dplyr)
library(tidyr)
mpg %>%
select(manufacturer, year, fl, displ) %>%
pivot_wider(names_from = fl, values_from = displ, values_fn = mean)
-output
# A tibble: 30 x 7
manufacturer year p r e d c
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 audi 1999 2.36 NA NA NA NA
2 audi 2008 2.73 NA NA NA NA
3 chevrolet 2008 6.47 4.49 5.3 NA NA
4 chevrolet 1999 5.7 4.22 NA 6.5 NA
5 dodge 1999 NA 4.32 NA NA NA
6 dodge 2008 NA 4.42 4.42 NA NA
7 ford 1999 NA 4.45 NA NA NA
8 ford 2008 5.4 4.58 NA NA NA
9 honda 1999 1.6 1.6 NA NA NA
10 honda 2008 2 1.8 NA NA 1.8
# … with 20 more rows
Upvotes: 2