LucasMation
LucasMation

Reputation: 2489

keep last non missing observation for all variables by group

My data has multiple columns and some of those columns have missing values in different rows. I would like to group (collapse) the data by the variable "g", keeping the last non missing obserbation of each varianle.

Input:

d <- data.table(a=c(1,NA,3,4),b=c(1,2,3,4),c=c(NA,NA,'c',NA),g=c(1,1,2,2))

Desired output

d_g <- data.table(a=c(1,4),b=c(2,4),c=c(NA,'c'),g=c(1,2))

data.table (or dplyr) solution prefered here

OBS:this is related to this question, but the main answers there seem to cause unecessary NAs in some groups

Upvotes: 2

Views: 778

Answers (3)

GKi
GKi

Reputation: 39667

In base aggregatecould be used.

aggregate(.~g, d, function(x) tail(x[!is.na(x)], 1), na.action = NULL)
#  g a b c
#1 1 1 2  
#2 2 4 4 c

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389047

Using data.table :

library(data.table)

d[, lapply(.SD, function(x) last(na.omit(x))), g]

#   g a b    c
#1: 1 1 2 <NA>
#2: 2 4 4    c

Upvotes: 4

tmfmnk
tmfmnk

Reputation: 39858

One option using dplyr could be:

d %>%
 group_by(g) %>%
 summarise(across(everything(), ~ if(all(is.na(.))) NA else last(na.omit(.))))

      g     a     b c    
  <dbl> <dbl> <dbl> <chr>
1     1     1     2 <NA> 
2     2     4     4 c 

Upvotes: 3

Related Questions