Mislav
Mislav

Reputation: 1573

Fill missed dates in R

I have following data frame:

df <- structure(list(Jmbg = c("0402977385033", "0402977385033", "0402977385033", 
"0402977385033", "0402977385033", "0402977385033", "0402977385033", 
"0402977385033", "0402977385033", "0402977385033", "0402977385033", 
"0402977385033", "0402977385033", "0402977385033", "0402977385033", 
"0402977385033", "0402977385033", "0402977385033", "2607954335134", 
"2607954335134", "2607954335134", "2607954335134", "2607954335134"
), DatumOd = structure(c(11260, 11457, 11503, 11603, 11696, 11826, 
11875, 12013, 12043, 12133, 12225, 12225, 12317, 12408, 12499, 
13878, 14425, 16203, 10915, 11687, 12372, 13878, 15217), class = "Date"), 
    DatumDo = structure(c(11455, 11502, 11520, 11695, 11800, 
    11941, 12012, 12042, 12132, 12224, 12225, 12316, 12407, 12498, 
    13877, NA, NA, NA, 11686, 12371, 13877, NA, NA), class = "Date")), row.names = c(NA, 
-23L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), vars = list(
    Jmbg), drop = TRUE, indices = list(0:17, 18:22), group_sizes = c(18L, 
5L), biggest_group_size = 18L, .Names = c("Jmbg", "DatumOd", 
"DatumDo"), labels = structure(list(Jmbg = c("0402977385033", 
"2607954335134")), row.names = c(NA, -2L), class = "data.frame", vars = list(
    Jmbg), drop = TRUE, indices = list(c(0L, 2L, 3L, 4L, 5L, 
6L, 8L, 9L, 10L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 22L), 
    c(1L, 7L, 11L, 20L, 21L)), group_sizes = c(18L, 5L), biggest_group_size = 18L, .Names = "Jmbg"))

First column is id. Second column is the date when some event start and third column is date when the event finished. As you can see, there are some missing values in column Datumdo. I want to fill this missing values in the following way: If the value is missing and but the DatumOd date is the last for id, it should stay NA. If date is missing and it is not the last DatumOd date, there should be next date from DatumOd column minus one day.

I tried with dplyr:

df_2 <- df %>%
  dplyr::group_by(Jmbg) %>%
  dplyr::mutate(novi_date = ifelse(is.na(DatumDo) & DatumOd != max(DatumOd), "test", DatumDo))

but instead of test there should be next date from DatumOd column for that id. For example, for the last "test", there should be a date 2011-08-30. Of course, you can use any other package (data.table...)

Upvotes: 0

Views: 407

Answers (1)

moman822
moman822

Reputation: 1954

Here is an answer using data.table. It uses the shift function to find the leading date (minus one day) for each row and assigns it to a new column, dayBeforeNext, by id. From there, an ifelse assigns the new dates to another new variable depending on whether there is already a date in DatumDo.

I needed to convert the dates to characters before assigning them, then back to date, because there appears to be an bug assigning date types in ifelse statements by reference with data.table.

setDT(df)
df[, dayBeforeNext := as.character(shift(DatumOd, type='lead')-1), by=Jmbg]
df[, DatumDo := as.character(DatumDo)]

df[, new := ifelse(is.na(DatumDo), dayBeforeNext, DatumDo)]

df[, new := as.Date(new)]

Your new `data.table will have extra columns so you can reassign those/remove them as you see fit.

Upvotes: 1

Related Questions