Henk
Henk

Reputation: 3656

remove last NA values in a data.table

I am running forecasts on data.tables that sometimes have NA values in the tail. These values are not yet available and can be removed. How do I remove NA values at the end of a series?

library(data.table)

DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
DT[c(4, 8:9), v := NA]

# required output:
DT[, head(.SD, 7 )]

Upvotes: 3

Views: 148

Answers (1)

Frank
Frank

Reputation: 66819

Find the last non-NA and keep everything up to it:

DT[ seq( max(which(!is.na(v))) ) ]

Similarly, you could use which.max from the reverse:

DT[ seq( .N-which.max(rev(!is.na(v)))+1L ) ]

Both options work fine if there are no NA to drop at the end.

They behave differently if v is entirely NA:

  • max(which(!is.na(v))) will give -Inf because which will return an empty vector.
    This makes seq throw an error.

  • which.max(rev(!is.na(v)) will return 1 because FALSE is the max value, found in position 1.
    This means all rows are returned.

If you want some other behavior in that case (like returning no rows), it should be simple to work out.

Upvotes: 1

Related Questions