Reputation: 65
My data is similar to the following:
data.frame(date=c("2013-07","2013-08","2013-09","2013-11",
"2013-11","2013-11","2014-02","2014-03"),
X=c("0","1","0","0","1","0","1","0"))
date x
1 2013-07 0
2 2013-08 1
3 2013-09 0
4 2013-11 0
5 2013-11 1
6 2013-11 0
7 2014-02 1
I want to create a new feature that shows the data that is not later than the current date but the closest from the current date when x=1
date x lastdate
1 2013-07 0 NA
2 2013-08 1 2013-08
3 2013-09 0 2013-08
4 2013-11 0 2013-11
5 2013-11 1 2013-11
6 2013-11 0 2013-11
7 2014-02 1 2014-02
8 2014-03 0 2014-02
Upvotes: 0
Views: 148
Reputation: 92300
One efficient solution could be using fintInterval
and search only within x == "1"
. I've added NA_character_
at the beginning for case when findInterval
returns a zero (like in your first row)
Some methodology explnation:
The basic idea here is to add an NA
at the beginning of df$date[df$X == "1"]
and then search within the original df$date[df$X == "1"]
vector. Whenever a value in df$date
is prior to all values in df$date[df$X == "1"]
, findInterval
assign it a 0
. This zero should become NA
eventually, hence we are adding +1
to all the incidents found by findInterval
and search within the new vector (which contains NA
). This way, all 0
s become 1
s and hence, they are being assigned NA
s because NA
is the first value in the new vector.
df[["lastdate"]] <- c(NA_character_,
as.character(df$date[df$X == "1"]))[findInterval(df$date, df$date[df$X == "1"]) + 1]
df
# date X lastdate
# 1 2013-07 0 <NA>
# 2 2013-08 1 2013-08
# 3 2013-09 0 2013-08
# 4 2013-11 0 2013-11
# 5 2013-11 1 2013-11
# 6 2013-11 0 2013-11
# 7 2014-02 1 2014-02
# 8 2014-03 0 2014-02
Or (this has a dplyr
tag)
library(dplyr)
df %>%
mutate(lastdate = c(NA_character_, as.character(date[X == "1"]))[findInterval(date, date[X == "1"]) + 1])
As a side note it would be probably easier to work with a numeric
X
rather a character
one, and either character
or a zoo::yearmon
rather a factor
(which is hard to modify) date
column
Upvotes: 1