Reputation: 35
I have this dataframe where each cod_user has a presence (1) or absence (0) on each month. I would like to merge the values from the rows to one single row for each cod_user that gathers all the values.
# A tibble: 24 × 5
cod_user ene feb mar abr
<chr> <dbl> <dbl> <dbl> <dbl>
1 ES7823 1 NA NA NA
2 AR3442 1 NA NA NA
3 CO9382 1 NA NA NA
4 ES5611 1 NA NA NA
5 IT9982 1 NA NA NA
6 PT6628 1 NA NA NA
7 ES7823 NA 1 NA NA
8 AR3442 NA 1 NA NA
9 CO9382 NA 0 NA NA
10 ES5611 NA 1 NA NA
11 IT9982 NA 1 NA NA
12 PT6628 NA 0 NA NA
13 ES7823 NA NA 1 NA
14 AR3442 NA NA 1 NA
15 CO9382 NA NA 0 NA
16 ES5611 NA NA 1 NA
17 IT9982 NA NA 0 NA
18 PT6628 NA NA 0 NA
19 ES7823 NA NA NA 0
20 AR3442 NA NA NA 1
21 CO9382 NA NA NA 0
22 ES5611 NA NA NA 1
23 IT9982 NA NA NA 0
24 PT6628 NA NA NA 0
I want my output to look like this:
cod_user ene feb mar abr
<chr> <dbl> <dbl> <dbl> <dbl>
1 ES7823 1 1 1 0
2 AR3442 1 1 1 1
3 CO9382 1 0 0 0
Can someone help me? Thank you.
Upvotes: 2
Views: 1265
Reputation: 886948
We could use max
library(dplyr)
df1 %>%
group_by(cod_user) %>%
summarise(across(everything(), ~ max(.x, na.rm = TRUE)), .groups = 'drop')
-output
# A tibble: 6 × 5
cod_user ene feb mar abr
<chr> <int> <int> <int> <int>
1 AR3442 1 1 1 1
2 CO9382 1 0 0 0
3 ES5611 1 1 1 1
4 ES7823 1 1 1 0
5 IT9982 1 1 0 0
6 PT6628 1 0 0 0
df1 <- structure(list(cod_user = c("ES7823", "AR3442", "CO9382", "ES5611",
"IT9982", "PT6628", "ES7823", "AR3442", "CO9382", "ES5611", "IT9982",
"PT6628", "ES7823", "AR3442", "CO9382", "ES5611", "IT9982", "PT6628",
"ES7823", "AR3442", "CO9382", "ES5611", "IT9982", "PT6628"),
ene = c(1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), feb = c(NA,
NA, NA, NA, NA, NA, 1L, 1L, 0L, 1L, 1L, 0L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), mar = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 1L, 0L, 0L, NA,
NA, NA, NA, NA, NA), abr = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 1L, 0L, 1L,
0L, 0L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24"))
Upvotes: 2
Reputation: 78917
Here is an alternative approach:
library(dplyr)
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}
df1 %>%
group_by(cod_user) %>%
summarise(across(everything(), coalesce_by_column))
cod_user ene feb mar abr
<chr> <int> <int> <int> <int>
1 AR3442 1 1 1 1
2 CO9382 1 0 0 0
3 ES5611 1 1 1 1
4 ES7823 1 1 1 0
5 IT9982 1 1 0 0
6 PT6628 1 0 0 0
Upvotes: 2
Reputation: 76402
Reshape the data to long format, get rid of the NA
's and reshape back to wide format.
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
df1 %>%
pivot_longer(-cod_user) %>%
drop_na() %>%
pivot_wider(cod_user)
#> # A tibble: 6 × 5
#> cod_user ene feb mar abr
#> <chr> <int> <int> <int> <int>
#> 1 ES7823 1 1 1 0
#> 2 AR3442 1 1 1 1
#> 3 CO9382 1 0 0 0
#> 4 ES5611 1 1 1 1
#> 5 IT9982 1 1 0 0
#> 6 PT6628 1 0 0 0
Created on 2022-12-18 with reprex v2.0.2
df1 <- "cod_user ene feb mar abr
1 ES7823 1 NA NA NA
2 AR3442 1 NA NA NA
3 CO9382 1 NA NA NA
4 ES5611 1 NA NA NA
5 IT9982 1 NA NA NA
6 PT6628 1 NA NA NA
7 ES7823 NA 1 NA NA
8 AR3442 NA 1 NA NA
9 CO9382 NA 0 NA NA
10 ES5611 NA 1 NA NA
11 IT9982 NA 1 NA NA
12 PT6628 NA 0 NA NA
13 ES7823 NA NA 1 NA
14 AR3442 NA NA 1 NA
15 CO9382 NA NA 0 NA
16 ES5611 NA NA 1 NA
17 IT9982 NA NA 0 NA
18 PT6628 NA NA 0 NA
19 ES7823 NA NA NA 0
20 AR3442 NA NA NA 1
21 CO9382 NA NA NA 0
22 ES5611 NA NA NA 1
23 IT9982 NA NA NA 0
24 PT6628 NA NA NA 0"
df1 <- read.table(textConnection(df1), header = TRUE)
Created on 2022-12-18 with reprex v2.0.2
Upvotes: 3