Reputation: 3656
I am running forecasts on data.tables that sometimes have NA values in the tail. These values are not yet available and can be removed. How do I remove NA values at the end of a series?
library(data.table)
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
DT[c(4, 8:9), v := NA]
# required output:
DT[, head(.SD, 7 )]
Upvotes: 3
Views: 148
Reputation: 66819
Find the last non-NA
and keep everything up to it:
DT[ seq( max(which(!is.na(v))) ) ]
Similarly, you could use which.max
from the reverse:
DT[ seq( .N-which.max(rev(!is.na(v)))+1L ) ]
Both options work fine if there are no NA
to drop at the end.
They behave differently if v
is entirely NA
:
max(which(!is.na(v)))
will give -Inf
because which
will return an empty vector.
This makes seq
throw an error.
which.max(rev(!is.na(v))
will return 1
because FALSE
is the max value, found in position 1.
This means all rows are returned.
If you want some other behavior in that case (like returning no rows), it should be simple to work out.
Upvotes: 1