SuperSatya
SuperSatya

Reputation: 65

Date closest but earlier than a given date based on another factor

My data is similar to the following:

data.frame(date=c("2013-07","2013-08","2013-09","2013-11",
              "2013-11","2013-11","2014-02","2014-03"),
       X=c("0","1","0","0","1","0","1","0"))

  date    x 
1 2013-07 0 
2 2013-08 1
3 2013-09 0 
4 2013-11 0 
5 2013-11 1
6 2013-11 0
7 2014-02 1

I want to create a new feature that shows the data that is not later than the current date but the closest from the current date when x=1

date      x  lastdate
1 2013-07 0       NA
2 2013-08 1  2013-08
3 2013-09 0  2013-08
4 2013-11 0  2013-11
5 2013-11 1  2013-11
6 2013-11 0  2013-11
7 2014-02 1  2014-02
8 2014-03 0  2014-02

Upvotes: 0

Views: 148

Answers (1)

David Arenburg
David Arenburg

Reputation: 92300

One efficient solution could be using fintInterval and search only within x == "1". I've added NA_character_ at the beginning for case when findInterval returns a zero (like in your first row)

Some methodology explnation:

The basic idea here is to add an NA at the beginning of df$date[df$X == "1"] and then search within the original df$date[df$X == "1"] vector. Whenever a value in df$date is prior to all values in df$date[df$X == "1"], findInterval assign it a 0. This zero should become NA eventually, hence we are adding +1 to all the incidents found by findInterval and search within the new vector (which contains NA). This way, all 0s become 1s and hence, they are being assigned NAs because NA is the first value in the new vector.

df[["lastdate"]] <- c(NA_character_, 
                      as.character(df$date[df$X == "1"]))[findInterval(df$date, df$date[df$X == "1"]) + 1]
df
#      date X lastdate
# 1 2013-07 0     <NA>
# 2 2013-08 1  2013-08
# 3 2013-09 0  2013-08
# 4 2013-11 0  2013-11
# 5 2013-11 1  2013-11
# 6 2013-11 0  2013-11
# 7 2014-02 1  2014-02
# 8 2014-03 0  2014-02

Or (this has a dplyr tag)

library(dplyr)
df %>%
   mutate(lastdate = c(NA_character_, as.character(date[X == "1"]))[findInterval(date, date[X == "1"]) + 1])

As a side note it would be probably easier to work with a numeric X rather a character one, and either character or a zoo::yearmon rather a factor (which is hard to modify) date column

Upvotes: 1

Related Questions