Reputation: 1743
I have a dataframe that looks like this:
df_raw <- structure(list(date = structure(c(17075, 17076, 17077, 17108,
17109, 17110, 17111, 17112, 17113, 17221, 17222, 17223, 17224,
17225, 17226, 17227, 17228, 17229, 17230, 17231, 17232, 17286,
17075, 17076, 17077, 17078, 17079, 17080, 17081, 17082, 17083,
17084, 17085, 17086, 17087, 17088, 17089, 17090, 17091), class = "Date"),
Req_BU = c("12018", "12018", "12018", "12018", "12018", "12018",
"12018", "12018", "12018", "12018", "12018", "12018", "12018",
"12018", "12018", "12018", "12018", "12018", "12018", "12018",
"12018", "12018", "14004", "14004", "14004", "14004", "14004",
"14004", "14004", "14004", "14004", "14004", "14004", "14004",
"14004", "14004", "14004", "14004", "14004"), last_rec_date = c(1L,
1L, 1L, 1L, 1L, NA, NA, 3L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 1L,
1L, 1L, NA, NA, 3L, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA,
3L, 1L, 1L, 1L, 1L, NA, 2L, 1L)), .Names = c("date", "Req_BU",
"last_rec_date"), row.names = c(NA, -39L), class = "data.frame")
> head(df_raw, 10)
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 1
6 2016-11-05 12018 NA
7 2016-11-06 12018 NA
8 2016-11-07 12018 3
9 2016-11-08 12018 1
10 2017-02-24 12018 1
> df_raw[22:30, ]
date Req_BU last_rec_date
22 2017-04-30 12018 1
23 2016-10-01 14004 NA
24 2016-10-02 14004 NA
25 2016-10-03 14004 1
26 2016-10-04 14004 1
27 2016-10-05 14004 1
28 2016-10-06 14004 1
29 2016-10-07 14004 1
30 2016-10-08 14004 NA
What I need to do is replace the NA
values in the last_rec_date
column with the number of days since the last non-NA
. This all needs to be done based on a grouping variable called Req_BU
. My data start on 10-01-2016 and if a particular Req_BU
starts with an NA
for that date, I need to fill that with a 1
and keep doing that until there is a non-NA
value at which point the normal logic takes over.
I am looking for something like this.
> head(df_hope, 10)
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 1
6 2016-11-05 12018 1
7 2016-11-06 12018 2
8 2016-11-07 12018 3
9 2016-11-08 12018 1
10 2017-02-24 12018 1
> df_hope[22:30, ]
date Req_BU last_rec_date
22 2017-04-30 12018 1
23 2016-10-01 14004 1
24 2016-10-02 14004 1
25 2016-10-03 14004 1
26 2016-10-04 14004 1
27 2016-10-05 14004 1
28 2016-10-06 14004 1
29 2016-10-07 14004 1
30 2016-10-08 14004 1
I tried this, but it isn't even handling the first part of the logic I need.
library(dplyr)
df_not_working <- df_raw %>%
group_by(Req_BU) %>%
mutate(last_rec_date = ifelse(is.na(last_rec_date),
c(NA, diff(date)),
last_rec_date))
> df_not_working
Source: local data frame [39 x 3]
Groups: Req_BU [2]
# A tibble: 39 x 3
date Req_BU last_rec_date
<date> <chr> <dbl>
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 1
6 2016-11-05 12018 1
7 2016-11-06 12018 1
8 2016-11-07 12018 3
9 2016-11-08 12018 1
10 2017-02-24 12018 1
The rest of the analysis is pretty dplyr
heavy so I am OK using that or a base solution if one exists. Thank you.
Upvotes: 0
Views: 49
Reputation: 10301
Maybe this will work? Not very R-ish so maybe someone has a better approach.
fill_na <- function(df, colname){
x<- 1
col <- as.character(colname)
dfcol <- df[as.character(colname)]
for(i in 1:nrow(dfcol)){
ifelse(is.na(dfcol[i,col]), {
df[i,col] = x
x <- x + 1
},
x <- 1)
}
return(df)
}
df_hope <- unsplit(lapply(split(df_raw, f = df_raw$Req_BU), fill_na, colname = "last_rec_date"), f = df_raw$Req_BU)
Edit: Made a clearer example to test approach:
example_df <- structure(list(date = structure(c(17075, 17076, 17077, 17108,
17109, 17083, 17084, 17085, 17086, 17087), class = "Date"), Req_BU = c("12018",
"12018", "12018", "12018", "12018", "14004", "14004", "14004",
"14004", "14004"), last_rec_date = c(1L, 1L, 1L, NA, NA, NA,
NA, NA, 1L, 1L)), .Names = c("date", "Req_BU", "last_rec_date"
), row.names = c(1L, 2L, 3L, 4L, 5L, 31L, 32L, 33L, 34L, 35L), class = "data.frame")
> example_df
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 NA
5 2016-11-04 12018 NA
31 2016-10-09 14004 NA
32 2016-10-10 14004 NA
33 2016-10-11 14004 NA
34 2016-10-12 14004 1
35 2016-10-13 14004 1
Starting with a dataframe where NA values cross the "border" between "Req_BU" 12018 and 14004, split that dataframe by "Req_BU" values into a list of independent dataframes. Then, use lapply
to apply the function above to each individual dataframe before using unsplit
to return to a single dataframe.
df_ex <- unsplit(lapply(split(example_df, f = example_df$Req_BU), fill_na, colname = "last_rec_date"), f = example_df$Req_BU)
> df_ex
date Req_BU last_rec_date
1 2016-10-01 12018 1
2 2016-10-02 12018 1
3 2016-10-03 12018 1
4 2016-11-03 12018 1
5 2016-11-04 12018 2
31 2016-10-09 14004 1
32 2016-10-10 14004 2
33 2016-10-11 14004 3
34 2016-10-12 14004 1
35 2016-10-13 14004 1
Upvotes: 1