Reputation: 174
I am working with a few survey responses and the data returned has been formatted awkwardly. Here is an example of what the data looks like:
df <- data_frame(Person = c("Person1", "Person1","Person2", "Person2","Person3", "Person3"), Q1 = c(NA, 1, NA, 2, NA, 1), Q2 = c(NA, 3, NA, 2, NA, 4),
Q3 = c(2, NA, 4, NA, 1, NA), Q4 = c(5, NA, 5, NA, 5, NA))
This is what I am starting with:
Person Q1 Q2 Q3 Q4
<chr> <dbl> <dbl> <dbl> <dbl>
1 Person1 NA NA 2 5
2 Person1 1 3 NA NA
3 Person2 NA NA 4 5
4 Person2 2 2 NA NA
5 Person3 NA NA 1 5
6 Person3 1 4 NA NA
This is what I would like:
Person Q1 Q2 Q3 Q4
<chr> <dbl> <dbl> <dbl> <dbl>
1 Person1 1 3 2 5
2 Person2 2 2 4 5
3 Person3 1 4 1 5
I would like to be able to accomplish this using dplyr but so far I have not had any luck.
Upvotes: 0
Views: 44
Reputation: 28675
You can get the first non-NA for each column within each group with coalesce
. No real reason to prefer that over na.omit
though unless you have >1 non-NA value.
library(tidyverse)
df %>%
group_by(Person) %>%
summarise_all(reduce, coalesce)
# # A tibble: 3 x 5
# Person Q1 Q2 Q3 Q4
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Person1 1 3 2 5
# 2 Person2 2 2 4 5
# 3 Person3 1 4 1 5
Upvotes: 1
Reputation: 886938
If we have only one non-NA element per each column per group
library(dplyr)
df %>%
group_by(Person) %>%
summarise_all(na.omit)
# A tibble: 3 x 5
# Person Q1 Q2 Q3 Q4
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 Person1 1 3 2 5
#2 Person2 2 2 4 5
#3 Person3 1 4 1 5
We can also use min/max/sum/median/
etc
df %>%
group_by(Person) %>%
summarise_all(mean, na.rm = TRUE)
Or
df %>%
group_by(Person) %>%
summarise_all(min, na.rm = TRUE)
Or
df %>%
group_by(Person) %>%
summarise_all(median, na.rm = TRUE)
Also, any of the functions that remove the NA
and get the first
non-NA element
df %>%
group_by(Person) %>%
summarise_all(list(~.[!is.na(.)]))
If the non-NA elements are more than 1, then either paste
in a string or have a list
column
df %>%
group_by(Person) %>%
summarise_all(list(~ toString(.[!is.na(.)])))
Upvotes: 1