Reputation: 657
I have a dataframe with more than 190,000 rows, something a bit like this:
library(tibble)
mydf <- tribble(~col1, ~col2, ~col3, ~col4, ~col5,
"A", 16, 45, 53, 35,
"A", 17, 12, 54, 12,
"A", 19, 12, 54, 35,
"B", 10, 87, 55, 22,
"B", 10, 87, 55, 22,
"B", 12, 23, 12, 67)
There are repeated iterations of col1
; some have the same values across the columns, others have different values across columns, as shown in the sample dataframe.
For each repeated level in col1
, I want to aggregate these values into one row showing the mean of all the rows. I got so far using this answer, however, this leaves me with all the distinct rows:
mydf %>% group_by(col1) %>%
mutate_each(funs(mean), -(1)) %>%
distinct()
# A tibble: 5 x 5
# Groups: col1 [2]
col1 col2 col3 col4 col5
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 16 23 53.7 27.3
2 A 17 23 53.7 27.3
3 A 19 23 53.7 27.3
4 B 10 65.7 40.7 37
5 B 12 65.7 40.7 37
What I actually want here is one row for A
, B
, etc, showing the mean values.
Upvotes: 0
Views: 48
Reputation: 25
If cols 2 through 5 are all observations of the same measurement, it makes sense to use tidyr::pivot_longer()
. Then take group by col1 and the new cols 2 through 5 group. Finally, to get in the form you want, use tidyr::pivot_wider()
library(tidyverse)
mydf <- tribble(~col1, ~col2, ~col3, ~col4, ~col5,
"A", 16, 45, 53, 35,
"A", 17, 12, 54, 12,
"A", 19, 12, 54, 35,
"B", 10, 87, 55, 22,
"B", 10, 87, 55, 22,
"B", 12, 23, 12, 67)
mydf %>%
pivot_longer(cols = -col1) %>%
group_by(col1, name) %>%
summarise(mean = mean(value)) %>%
pivot_wider(names_from = name, values_from = mean)
#> # A tibble: 2 x 5
#> # Groups: col1 [2]
#> col1 col2 col3 col4 col5
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 17.3 23 53.7 27.3
#> 2 B 10.7 65.7 40.7 37
Created on 2020-03-20 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 8072
You need to use summarize
instead of mutate
, to summarize the grouped values. In this case I'm using summarize_all
to summarize all non-grouped values.
library(tidyverse)
mydf <- tribble(~col1, ~col2, ~col3, ~col4, ~col5,
"A", 16, 45, 53, 35,
"A", 17, 12, 54, 12,
"A", 19, 12, 54, 35,
"B", 10, 87, 55, 22,
"B", 10, 87, 55, 22,
"B", 12, 23, 12, 67)
mydf %>%
group_by(col1) %>%
summarize_all(.funs = list(mean))
# A tibble: 2 x 5
col1 col2 col3 col4 col5
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 17.3 23 53.7 27.3
2 B 10.7 65.7 40.7 37
Upvotes: 1