Reputation: 129
I have a list of data.frames
of equal size. There exist missing data in different rows and columns of each data.frame
. I would like to remove the row of each data frame for which one of data.frames
have a row that contains a NaN
. The current lapply
and na.omit
code I have removes each row corresponding to the specific data.frame
which makes sense as it goes through each data.frame
in the list before moving on to the next one. However, I would like to make it so that if an NaN
exists in one row of a data.frame
that row gets removed from all other data.frames
Some example code:
#Make list
ls <- list(x1=data.frame(a=c(1,2,3,4),b=c(2,3,4,5),c=c(3,4,NaN,6)),
x2=data.frame(a=c(1,NaN,3,4),b=c(2,3,4,5),c=c(3,4,5,6)))
#Desired output
lscalc <- list(x1=data.frame(a=c(1,4),b=c(2,5),c=c(3,6)),
x2=data.frame(a=c(1,4),b=c(2,5),c=c(3,6)))
Upvotes: 1
Views: 88
Reputation: 21937
Here's one using complete.cases()
, though otherwise along the same lines as @akrun's.
#Make list
l <- list(x1=data.frame(a=c(1,2,3,4),b=c(2,3,4,5),c=c(3,4,NaN,6)),
x2=data.frame(a=c(1,NaN,3,4),b=c(2,3,4,5),c=c(3,4,5,6)))
#Desired output
lcalc <- list(x1=data.frame(a=c(1,4),b=c(2,5),c=c(3,6)),
x2=data.frame(a=c(1,4),b=c(2,5),c=c(3,6)))
inds <- lapply(l, \(x)which(!complete.cases(x)))
inds <- unique(do.call(c, inds))
lcalc2 <- lapply(l, \(x)x[-inds, ])
lcalc2
#> $x1
#> a b c
#> 1 1 2 3
#> 4 4 5 6
#>
#> $x2
#> a b c
#> 1 1 2 3
#> 4 4 5 6
Created on 2022-05-24 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 887118
Assuming all the datasets have the same number of rows, get the row
index from all the datasets first and then loop over the list
and remove those rows
un1 <- unique(unlist(lapply(ls, function(x) which(is.na(x), arr.ind = TRUE)[,1])))
lapply(ls, function(x) x[!seq_len(nrow(x)) %in% un1, ])
$x1
a b c
1 1 2 3
4 4 5 6
$x2
a b c
1 1 2 3
4 4 5 6
Upvotes: 1