How do use summarise_all after a group_by function

my dataset was collected as shown on the left (old_df) but I want to group by name such that the answer for each variable is summarised on one row for each observation like the table on the right (new_df). I used the following code but it's not working because the second observation (james) has no value for the educated column and therefore the number of its column is just 3. Does anyone know what I need to use to replace 'na.omit' to make this work. PS: If I put an answer in the educated column for james, the code works fine.

                 new_df <- old_df %>%
                             group_by(name)%>%
                             summarise_all(na.omit)

Find the image here : https://i.sstatic.net/nvp5P.jpg

Upvotes: 1

Views: 270

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389235

You can select the first value after na.omit. This will return NA if all the values are NA.

Using data from @Martin Gal

library(dplyr)
df %>% group_by(name) %>% summarise(across(.fns = ~na.omit(.)[1]))

#  name  married employed citizen educated
#  <chr> <chr>   <chr>    <chr>   <chr>   
#1 james yes     no       no      NA      
#2 john  yes     yes      yes     yes     
#3 peter no      no       yes     yes     

Or with summarise_all -

df %>% group_by(name) %>% summarise_all(~na.omit(.)[1])

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16998

You could use

library(tidyr)
library(dplyr)

df %>% 
  group_by(name) %>% 
  fill(!name, .direction="downup") %>% 
  distinct() %>%
  ungroup()

to get

# A tibble: 3 x 5
  name  married employed citizen educated
  <chr> <chr>   <chr>    <chr>   <chr>   
1 john  yes     yes      yes     yes     
2 james yes     no       no      NA      
3 peter no      no       yes     yes   

Note: summarise_all has been superseded by across(). If your dplyr version doesn't support across(), make sure to update to the latest version.

Data

structure(list(name = c("john", "john", "john", "john", "james", 
"james", "james", "james", "peter", "peter", "peter", "peter"
), married = c("yes", NA, NA, NA, "yes", NA, NA, NA, "no", NA, 
NA, NA), employed = c(NA, "yes", NA, NA, NA, "no", NA, NA, NA, 
"no", NA, NA), citizen = c(NA, NA, "yes", NA, NA, NA, "no", NA, 
NA, NA, "yes", NA), educated = c(NA, NA, NA, "yes", NA, NA, NA, 
NA, NA, NA, NA, "yes")), problems = structure(list(row = 12L, 
    col = "educated", expected = "", actual = "embedded null", 
    file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = "data.frame", row.names = c(NA, 
-12L), spec = structure(list(cols = list(name = structure(list(), class = c("collector_character", 
"collector")), married = structure(list(), class = c("collector_character", 
"collector")), employed = structure(list(), class = c("collector_character", 
"collector")), citizen = structure(list(), class = c("collector_character", 
"collector")), educated = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

Upvotes: 2

Onyambu
Onyambu

Reputation: 79338

You could do:

df %>%
  group_by(name) %>%
  summarise_all(~invoke(coalesce, as.list(.x)))

# A tibble: 3 x 5
  name  married employecd citizen educated
  <chr> <chr>   <chr>     <chr>   <chr>   
1 james yes     no        no      NA      
2 John  yes     yes       no      yes     
3 peter no      no        yes     yes  

or

df %>%
  group_by(name) %>%
  summarise_all(~lift(coalesce)(as.list(.x)))

Upvotes: 1

Related Questions