Reputation: 11
my dataset was collected as shown on the left (old_df) but I want to group by name such that the answer for each variable is summarised on one row for each observation like the table on the right (new_df). I used the following code but it's not working because the second observation (james) has no value for the educated column and therefore the number of its column is just 3. Does anyone know what I need to use to replace 'na.omit' to make this work. PS: If I put an answer in the educated column for james, the code works fine.
new_df <- old_df %>%
group_by(name)%>%
summarise_all(na.omit)
Find the image here : https://i.sstatic.net/nvp5P.jpg
Upvotes: 1
Views: 270
Reputation: 389235
You can select the first value after na.omit
. This will return NA
if all the values are NA
.
Using data from @Martin Gal
library(dplyr)
df %>% group_by(name) %>% summarise(across(.fns = ~na.omit(.)[1]))
# name married employed citizen educated
# <chr> <chr> <chr> <chr> <chr>
#1 james yes no no NA
#2 john yes yes yes yes
#3 peter no no yes yes
Or with summarise_all
-
df %>% group_by(name) %>% summarise_all(~na.omit(.)[1])
Upvotes: 1
Reputation: 16998
You could use
library(tidyr)
library(dplyr)
df %>%
group_by(name) %>%
fill(!name, .direction="downup") %>%
distinct() %>%
ungroup()
to get
# A tibble: 3 x 5
name married employed citizen educated
<chr> <chr> <chr> <chr> <chr>
1 john yes yes yes yes
2 james yes no no NA
3 peter no no yes yes
Note: summarise_all
has been superseded by across()
. If your dplyr
version doesn't support across()
, make sure to update to the latest version.
structure(list(name = c("john", "john", "john", "john", "james",
"james", "james", "james", "peter", "peter", "peter", "peter"
), married = c("yes", NA, NA, NA, "yes", NA, NA, NA, "no", NA,
NA, NA), employed = c(NA, "yes", NA, NA, NA, "no", NA, NA, NA,
"no", NA, NA), citizen = c(NA, NA, "yes", NA, NA, NA, "no", NA,
NA, NA, "yes", NA), educated = c(NA, NA, NA, "yes", NA, NA, NA,
NA, NA, NA, NA, "yes")), problems = structure(list(row = 12L,
col = "educated", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = "data.frame", row.names = c(NA,
-12L), spec = structure(list(cols = list(name = structure(list(), class = c("collector_character",
"collector")), married = structure(list(), class = c("collector_character",
"collector")), employed = structure(list(), class = c("collector_character",
"collector")), citizen = structure(list(), class = c("collector_character",
"collector")), educated = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Upvotes: 2
Reputation: 79338
You could do:
df %>%
group_by(name) %>%
summarise_all(~invoke(coalesce, as.list(.x)))
# A tibble: 3 x 5
name married employecd citizen educated
<chr> <chr> <chr> <chr> <chr>
1 james yes no no NA
2 John yes yes no yes
3 peter no no yes yes
or
df %>%
group_by(name) %>%
summarise_all(~lift(coalesce)(as.list(.x)))
Upvotes: 1