Reputation: 117
example <- structure(list(id = c(1, 1, 2, 2, 3, 3, 4), var1 = c(5, NA, 8,
9, 10, NA, 3), var2 = c(23, NA, 8, 9, NA, NA, 6), var3 = c(NA,
NA, NA, NA, NA, NA, NA), var4 = c(5, 5, 6, 6, 7, 7, 8)), row.names = c(NA,
-7L), class = "data.frame")
I have a large and complex data frame where the same measure may be present more than once, and the different entries for the same measure may contain different data. I have created a simplified example above. What I would like to do is merge this down as much as possible. I think it's easiest for me to explain if I consult the example:
print(example)
id var1 var2 var3 var4
1 1 5 23 NA 5
2 1 NA NA NA 5
3 2 8 8 NA 6
4 2 9 9 NA 6
5 3 10 NA NA 7
6 3 NA NA NA 7
7 4 3 6 NA 8
Thus, the desired output looks like:
id var1 var2 var3 var4
1 1 5 23 NA 5
2 2 8 8 NA 6
3 2 9 9 NA 6
4 3 10 NA NA 7
5 4 3 6 NA 8
There are similar questions on StackOverflow, but none of them deal with all of these features simultaneously. The closest I've got to this is:
example %>%
group_by(id) %>%
summarise_all(funs(list(na.omit(.))))
id var1 var2 var3 var4
<dbl> <list> <list> <list> <list>
1 1 <dbl [1]> <dbl [1]> <lgl [0]> <dbl [2]>
2 2 <dbl [2]> <dbl [2]> <lgl [0]> <dbl [2]>
3 3 <dbl [1]> <dbl [0]> <lgl [0]> <dbl [2]>
4 4 <dbl [1]> <dbl [1]> <lgl [0]> <dbl [1]>
Which essentially does what I want, but puts everything into lists, which hinders any downstream work with the dataframe. Instead of getting two rows for ID 2 I get a single row, which contains a list with 8 and 9 for var1, for example. This also occurs for matching variables: for ID1, var4 you get a list containing 5, 5.
Upvotes: 2
Views: 56
Reputation: 388982
You can do this across
. For each column and id
, you can keep only unique
non-NA values,
library(dplyr)
example %>%
group_by(id) %>%
summarise(across(.fns = ~{x <- unique(na.omit(.));x[1:max(1, length(x))]}))
#With summarise_all you can do
#summarise_all(~{x <- unique(na.omit(.));x[1:max(1, length(x))]})
# id var1 var2 var3 var4
# <dbl> <dbl> <dbl> <lgl> <dbl>
#1 1 5 23 NA 5
#2 2 8 8 NA 6
#3 2 9 9 NA 6
#4 3 10 NA NA 7
#5 4 3 6 NA 8
Upvotes: 2