Batuhan Kavlak
Batuhan Kavlak

Reputation: 377

R aggregate strings by dropping NA values

I have a DF like this:

df_1 <- data_frame(
  id = c(1, 1, 2, 2, 3),
  Class1 = c("C1", NA, "C3", "C3", NA),
  Class2 = c(NA, "C2", NA, NA, "C4")
)
> df_1
# A tibble: 5 x 3
     id Class1 Class2
  <dbl> <chr>  <chr> 
1     1 C1     NA    
2     1 NA     C2    
3     2 C3     NA    
4     2 C3     NA    
5     3 NA     C4 

I need a final output:

> df_1
# A tibble: 5 x 3
     id Class1 Class2
  <dbl> <chr>  <chr> 
1     1 C1     C2    
2     2 C3     NA    
3     3 NA     C4 

I am trying to group and summarize:

df_1 %>% group_by(id) %>% summarise_at(vars(Class1, Class2), ~ unique(.))
# A tibble: 4 x 3
# Groups:   id [3]
     id Class1 Class2
  <dbl> <chr>  <chr> 
1     1 C1     NA    
2     1 NA     C2    
3     2 C3     NA    
4     3 NA     C4  

How can I drop NA values if either one column has nonNA value? I couldn't find any example dealing with character columns.

Upvotes: 1

Views: 111

Answers (3)

Bas
Bas

Reputation: 4658

dplyr::coalesce does the job:

df_1 %>%
  group_by(id) %>%
  summarise_at(vars(Class1, Class2), function(x) coalesce(!!!x))

gives

# A tibble: 3 x 3
     id Class1 Class2
  <dbl> <chr>  <chr> 
1     1 C1     C2    
2     2 C3     NA    
3     3 NA     C4 

Upvotes: 5

Ronak Shah
Ronak Shah

Reputation: 388862

You can get the first non-NA value for Class columns.

library(dplyr)

df_1 %>%
 group_by(id) %>%
 summarise(across(starts_with('Class'), ~na.omit(.)[1]))
 #In older dplyr use summarise_at
 #summarise_at(vars(starts_with('Class')), ~na.omit(.)[1])

# A tibble: 3 x 3
#     id Class1 Class2
#  <dbl> <chr>  <chr> 
#1     1 C1     C2    
#2     2 C3     NA    
#3     3 NA     C4    

Upvotes: 2

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

Another solution. Fill in the gaps in groups and remove duplicates.

df %>% 
  group_by(id) %>% 
  fill(everything(), .direction = "updown") %>% 
  distinct()

Upvotes: 2

Related Questions