ThePlowKing
ThePlowKing

Reputation: 341

Searching a vector/data table backwards in R

Basically, I have a very large data frame/data table and I would like to search a column for the first, and closest, NA value which is less than my current index position.

For example, let's say I have a data frame DF as follows:

INDEX | KEY   |   ITEM
----------------------
 1    |  10   |    AAA
 2    |  12   |    AAA
 3    |  NA   |    AAA
 4    |  18   |    AAA
 5    |  NA   |    AAA
 6    |  24   |    AAA
 7    |  29   |    AAA
 8    |  31   |    AAA
 9    |  34   |    AAA

From this data frame we have an NA value at index 3 and at index 5. Now, let's say we start at index 8 (which has KEY of 31). I would like to search the column KEY backwards such that the moment it finds the first instance of NA the search stops, and the index of the NA value is returned.

I know there are ways to find all NA values in a vector/column (for example, I can use which(is.na(x)) to return the index values which have NA) but due to the sheer size of the data frame I am working and due to the large number of iterations that need to be performed this is a very inefficient way of doing it. One method I thought of doing is creating a kind of "do while" loop and it does seem to work, but this again seems quite inefficient since it needs to perform calculations each time (and given that I need to do over 100,000 iterations this does not look like a good idea).

Is there a fast way of searching a column backwards from a particular index such that I can find the index of the closest NA value?

Upvotes: 2

Views: 512

Answers (1)

Marius
Marius

Reputation: 60080

Why not do a forward-fill of the NA indexes once, so that you can then look up the most recent NA for any row in future:

library(dplyr)
library(tidyr)

df = df %>%
    mutate(last_missing = if_else(is.na(KEY), INDEX, as.integer(NA))) %>%
    fill(last_missing)

Output:

> df
  INDEX KEY ITEM last_missing
1     1  10  AAA           NA
2     2  12  AAA           NA
3     3  NA  AAA            3
4     4  18  AAA            3
5     5  NA  AAA            5
6     6  24  AAA            5
7     7  29  AAA            5
8     8  31  AAA            5
9     9  34  AAA            5

Now there's no need to recalculate every time you need the answer for a given row. There may be more efficient ways to do the forward fill, but I think exploring those is easier than figuring out how to optimise the backward search.

Upvotes: 4

Related Questions