Reputation: 1642
I have the following dataset:
Lines <- "id time sex Age A
1 1 male 90 0
1 2 male 91 0
1 3 male 92 1
2 1 female 87 0
2 2 female 88 0
2 3 female 89 0
3 1 male 50 0
3 2 male 51 1
3 3 male 52 0
4 1 female 54 0
4 2 female 55 0
4 3 female 56 0"
I would like to obtain the mean value for the numeric variable and the highest value for the binary variables (e.g. if 0 and 1, then 1). And for the string variables, just the name as this does not vary across id.
The resulting data frame should be something like this:
Lines <- "id time sex Age A
1 2 male 91 1
2 2 female 88 0
3 2 male 51 1
4 2 female 55 0"
I have more or less an idea:
Lines <- Lines %>%
group_by(id, across(where(is.character))) %>%
summarise(across(where(all(column %in% 0:1)), max)) %>%
summarise(across(where(is.numeric), mean)) %>%
ungroup
Upvotes: 1
Views: 66
Reputation: 21928
I thought about using across
function to apply mean
on only numeric variables in one try, but I noticed they are actually stored as integer
as well as your binary variable. So we either have to change their classes before our solution and use across
or explicitly name every variable like this:
library(dplyr)
df %>%
group_by(id) %>%
summarise(id = first(id), sex = first(sex), time = mean(time),
Age = mean(Age), A = max(A))
# A tibble: 4 x 5
id sex time Age A
<int> <chr> <dbl> <dbl> <int>
1 1 male 2 91 1
2 2 female 2 88 0
3 3 male 2 51 1
4 4 female 2 55 0
Here is another solution, however I coerced some variables into classes mentioned in the question before applying our transformation. It would be a good idea if you could do it too:
df %>%
group_by(id) %>%
mutate(across(c(time, Age), as.numeric)) %>%
summarise(id = first(id), sex = first(sex),
across(where(is.integer), max, .names = "{.col}"),
across(where(is.double), mean, .names = "{.col}"))
# A tibble: 4 x 5
id sex A time Age
<int> <chr> <int> <dbl> <dbl>
1 1 male 1 2 91
2 2 female 0 2 88
3 3 male 1 2 51
4 4 female 0 2 55
Upvotes: 2