Dan
Dan

Reputation: 113

r - Lag a data.frame by the number of NAs

In other words, I am trying to lag a data.frame that looks like this:

V1 V2 V3 V4 V5 V6 
1  1  1  1  1  1
2  2  2  2  2  NA
3  3  3  3  NA NA
4  4  4  NA NA NA
5  5  NA NA NA NA
6  NA NA NA NA NA

To something that looks like this:

V1 V2 V3 V4 V5 V6 
1  NA NA NA NA NA
2  1  NA NA NA NA
3  2  1  NA NA NA
4  3  2  1  NA NA
5  4  3  2  1  NA
6  5  4  3  2  1

So far, I have used a function that counts the number of NAs, and have tried to lag my each column in my data.frame by the corresponding number of NAs in that column.

V1 <- c(1,2,3,4,5,6)
V2 <- c(1,2,3,4,5,NA)
V3 <- c(1,2,3,4,NA,NA)
V4 <- c(1,2,3,NA,NA,NA)
V5 <- c(1,2,NA,NA,NA,NA)
V6 <- c(1,NA,NA,NA,NA,NA)
mydata <- cbind(V1,V2,V3,V4,V5,V6)
na.count <- colSums(is.na(mydata))
lag.by <- function(mydata, na.count){lag(mydata, k = na.count)}
lagged.df <- apply(mydata, 2, lag.by) 

But this code just lags the entire data.frame by one...

Upvotes: 2

Views: 198

Answers (2)

phiver
phiver

Reputation: 23598

You could use the sort function with option na.last = FALSE like this:

edit:

Akrun's comment is a valid one. If the values need to stay in the order as they are in the data.frame, then Akrun's answer is the best. Sort will out everything in order from low to high with the NA's in front.

library(purrr)
map_df(mydata, sort, na.last = FALSE)
# A tibble: 6 x 6
     V1    V2    V3    V4    V5    V6
  <int> <int> <int> <int> <int> <int>
1     1    NA    NA    NA    NA    NA
2     2     1    NA    NA    NA    NA
3     3     2     1    NA    NA    NA
4     4     3     2     1    NA    NA
5     5     4     3     2     1    NA
6     6     5     4     3     2     1

Or apply:

apply(mydata, 2, sort , na.last = FALSE)
     V1 V2 V3 V4 V5 V6
[1,]  1 NA NA NA NA NA
[2,]  2  1 NA NA NA NA
[3,]  3  2  1 NA NA NA
[4,]  4  3  2  1 NA NA
[5,]  5  4  3  2  1 NA
[6,]  6  5  4  3  2  1

edit2:

As nicolo commented. order can preserve the order of the variables.

mydata[,3] <- c(4, 3, 1, 2, NA, NA)
map_df(mydata, function(x) x[order(!is.na(x))])
# A tibble: 6 x 6
     V1    V2    V3    V4    V5    V6
  <int> <int> <dbl> <int> <int> <int>
1     1    NA    NA    NA    NA    NA
2     2     1    NA    NA    NA    NA
3     3     2     4    NA    NA    NA
4     4     3     3     1    NA    NA
5     5     4     1     2     1    NA
6     6     5     2     3     2     1

Upvotes: 4

akrun
akrun

Reputation: 887128

One option would be to loop through the columns with apply and append the NA elements first by subsetting the NA elements using is.na and then the non-NA element by negating the logical vector (is.na)

apply(mydata, 2, function(x) c(x[is.na(x)], x[!is.na(x)]))
#     V1 V2 V3 V4 V5 V6
#[1,]  1 NA NA NA NA NA
#[2,]  2  1 NA NA NA NA
#[3,]  3  2  1 NA NA NA
#[4,]  4  3  2  1 NA NA
#[5,]  5  4  3  2  1 NA
#[6,]  6  5  4  3  2  1

Upvotes: 6

Related Questions