IPLNAD
IPLNAD

Reputation: 35

Is there a way in R to merge rows based on condition?

I have this dataframe where each cod_user has a presence (1) or absence (0) on each month. I would like to merge the values from the rows to one single row for each cod_user that gathers all the values.

# A tibble: 24 × 5
   cod_user   ene   feb   mar   abr
   <chr>    <dbl> <dbl> <dbl> <dbl>
 1 ES7823       1    NA    NA    NA
 2 AR3442       1    NA    NA    NA
 3 CO9382       1    NA    NA    NA
 4 ES5611       1    NA    NA    NA
 5 IT9982       1    NA    NA    NA
 6 PT6628       1    NA    NA    NA
 7 ES7823      NA     1    NA    NA
 8 AR3442      NA     1    NA    NA
 9 CO9382      NA     0    NA    NA
10 ES5611      NA     1    NA    NA
11 IT9982      NA     1    NA    NA
12 PT6628      NA     0    NA    NA
13 ES7823      NA    NA     1    NA
14 AR3442      NA    NA     1    NA
15 CO9382      NA    NA     0    NA
16 ES5611      NA    NA     1    NA
17 IT9982      NA    NA     0    NA
18 PT6628      NA    NA     0    NA
19 ES7823      NA    NA    NA     0
20 AR3442      NA    NA    NA     1
21 CO9382      NA    NA    NA     0
22 ES5611      NA    NA    NA     1
23 IT9982      NA    NA    NA     0
24 PT6628      NA    NA    NA     0

I want my output to look like this:

  cod_user   ene   feb   mar   abr
  <chr>    <dbl> <dbl> <dbl> <dbl>
1 ES7823       1     1     1     0
2 AR3442       1     1     1     1
3 CO9382       1     0     0     0

Can someone help me? Thank you.

Upvotes: 2

Views: 1265

Answers (3)

akrun
akrun

Reputation: 886948

We could use max

library(dplyr)
df1 %>% 
  group_by(cod_user) %>% 
  summarise(across(everything(), ~ max(.x, na.rm = TRUE)), .groups = 'drop')

-output

# A tibble: 6 × 5
  cod_user   ene   feb   mar   abr
  <chr>    <int> <int> <int> <int>
1 AR3442       1     1     1     1
2 CO9382       1     0     0     0
3 ES5611       1     1     1     1
4 ES7823       1     1     1     0
5 IT9982       1     1     0     0
6 PT6628       1     0     0     0

data

df1 <- structure(list(cod_user = c("ES7823", "AR3442", "CO9382", "ES5611", 
"IT9982", "PT6628", "ES7823", "AR3442", "CO9382", "ES5611", "IT9982", 
"PT6628", "ES7823", "AR3442", "CO9382", "ES5611", "IT9982", "PT6628", 
"ES7823", "AR3442", "CO9382", "ES5611", "IT9982", "PT6628"), 
    ene = c(1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), feb = c(NA, 
    NA, NA, NA, NA, NA, 1L, 1L, 0L, 1L, 1L, 0L, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), mar = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 1L, 0L, 0L, NA, 
    NA, NA, NA, NA, NA), abr = c(NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 1L, 0L, 1L, 
    0L, 0L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24"))

Upvotes: 2

TarJae
TarJae

Reputation: 78917

Here is an alternative approach:

library(dplyr)

coalesce_by_column <- function(df) {
  return(dplyr::coalesce(!!! as.list(df)))
}

df1 %>%
  group_by(cod_user) %>% 
  summarise(across(everything(), coalesce_by_column))       
  cod_user   ene   feb   mar   abr
  <chr>    <int> <int> <int> <int>
1 AR3442       1     1     1     1
2 CO9382       1     0     0     0
3 ES5611       1     1     1     1
4 ES7823       1     1     1     0
5 IT9982       1     1     0     0
6 PT6628       1     0     0     0

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76402

Reshape the data to long format, get rid of the NA's and reshape back to wide format.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

df1 %>%
  pivot_longer(-cod_user) %>%
  drop_na() %>%
  pivot_wider(cod_user)
#> # A tibble: 6 × 5
#>   cod_user   ene   feb   mar   abr
#>   <chr>    <int> <int> <int> <int>
#> 1 ES7823       1     1     1     0
#> 2 AR3442       1     1     1     1
#> 3 CO9382       1     0     0     0
#> 4 ES5611       1     1     1     1
#> 5 IT9982       1     1     0     0
#> 6 PT6628       1     0     0     0

Created on 2022-12-18 with reprex v2.0.2


Data

df1 <- "cod_user   ene   feb   mar   abr
 1 ES7823       1    NA    NA    NA
 2 AR3442       1    NA    NA    NA
 3 CO9382       1    NA    NA    NA
 4 ES5611       1    NA    NA    NA
 5 IT9982       1    NA    NA    NA
 6 PT6628       1    NA    NA    NA
 7 ES7823      NA     1    NA    NA
 8 AR3442      NA     1    NA    NA
 9 CO9382      NA     0    NA    NA
10 ES5611      NA     1    NA    NA
11 IT9982      NA     1    NA    NA
12 PT6628      NA     0    NA    NA
13 ES7823      NA    NA     1    NA
14 AR3442      NA    NA     1    NA
15 CO9382      NA    NA     0    NA
16 ES5611      NA    NA     1    NA
17 IT9982      NA    NA     0    NA
18 PT6628      NA    NA     0    NA
19 ES7823      NA    NA    NA     0
20 AR3442      NA    NA    NA     1
21 CO9382      NA    NA    NA     0
22 ES5611      NA    NA    NA     1
23 IT9982      NA    NA    NA     0
24 PT6628      NA    NA    NA     0"
df1 <- read.table(textConnection(df1), header = TRUE)

Created on 2022-12-18 with reprex v2.0.2

Upvotes: 3

Related Questions