Reputation: 2489
My data has multiple columns and some of those columns have missing values in different rows. I would like to group (collapse) the data by the variable "g", keeping the last non missing obserbation of each varianle.
Input:
d <- data.table(a=c(1,NA,3,4),b=c(1,2,3,4),c=c(NA,NA,'c',NA),g=c(1,1,2,2))
Desired output
d_g <- data.table(a=c(1,4),b=c(2,4),c=c(NA,'c'),g=c(1,2))
data.table (or dplyr) solution prefered here
OBS:this is related to this question, but the main answers there seem to cause unecessary NAs in some groups
Upvotes: 2
Views: 778
Reputation: 39667
In base aggregate
could be used.
aggregate(.~g, d, function(x) tail(x[!is.na(x)], 1), na.action = NULL)
# g a b c
#1 1 1 2
#2 2 4 4 c
Upvotes: 2
Reputation: 389047
Using data.table
:
library(data.table)
d[, lapply(.SD, function(x) last(na.omit(x))), g]
# g a b c
#1: 1 1 2 <NA>
#2: 2 4 4 c
Upvotes: 4
Reputation: 39858
One option using dplyr
could be:
d %>%
group_by(g) %>%
summarise(across(everything(), ~ if(all(is.na(.))) NA else last(na.omit(.))))
g a b c
<dbl> <dbl> <dbl> <chr>
1 1 1 2 <NA>
2 2 4 4 c
Upvotes: 3