Reputation: 341
Basically, I have a very large data frame/data table and I would like to search a column for the first, and closest, NA
value which is less than my current index position.
For example, let's say I have a data frame DF
as follows:
INDEX | KEY | ITEM
----------------------
1 | 10 | AAA
2 | 12 | AAA
3 | NA | AAA
4 | 18 | AAA
5 | NA | AAA
6 | 24 | AAA
7 | 29 | AAA
8 | 31 | AAA
9 | 34 | AAA
From this data frame we have an NA
value at index 3 and at index 5. Now, let's say we start at index 8 (which has KEY
of 31). I would like to search the column KEY
backwards such that the moment it finds the first instance of NA
the search stops, and the index of the NA value is returned.
I know there are ways to find all NA
values in a vector/column (for example, I can use which(is.na(x))
to return the index values which have NA
) but due to the sheer size of the data frame I am working and due to the large number of iterations that need to be performed this is a very inefficient way of doing it. One method I thought of doing is creating a kind of "do while" loop and it does seem to work, but this again seems quite inefficient since it needs to perform calculations each time (and given that I need to do over 100,000 iterations this does not look like a good idea).
Is there a fast way of searching a column backwards from a particular index such that I can find the index of the closest NA
value?
Upvotes: 2
Views: 512
Reputation: 60080
Why not do a forward-fill of the NA indexes once, so that you can then look up the most recent NA for any row in future:
library(dplyr)
library(tidyr)
df = df %>%
mutate(last_missing = if_else(is.na(KEY), INDEX, as.integer(NA))) %>%
fill(last_missing)
Output:
> df
INDEX KEY ITEM last_missing
1 1 10 AAA NA
2 2 12 AAA NA
3 3 NA AAA 3
4 4 18 AAA 3
5 5 NA AAA 5
6 6 24 AAA 5
7 7 29 AAA 5
8 8 31 AAA 5
9 9 34 AAA 5
Now there's no need to recalculate every time you need the answer for a given row. There may be more efficient ways to do the forward fill, but I think exploring those is easier than figuring out how to optimise the backward search.
Upvotes: 4