Reputation: 91
I need to group data on ID and then replace the missing value of price by iterating on a date value up and down. first, look for 1 date value up and down if no data go 2 date values up and down until there is a mean value for all rows.
df1 <- data.frame(id = c(11,11,11,11,11,11,11,11,555,555,555,555,555,555,555,555,555),
Date = c("1-Jun", "18-Jun", "3-Jul", "4-Jul", "25-Jul", "3-Nov", "7-Nov", "28_Nov",
"1-Jun", "18-Jun", "3-Jul", "4-Jul", "25-Jul", "3-Nov", "7-Nov", "28_Nov",
"30-Nov"),
price = c(NA, NA, 100, NA, 25, NA, 50, NA, 400, NA, NA, NA, NA, NA, NA, NA, 200)
)
Updated requirement:
df1 <- data.frame(id = c(11,11,11,11,11,11,11,11),
Date = c("1-Jun", "5-Jun", "8-Jun", "9-Jun", "14-Jun", "16-Jun", "20-Jun", "21-Jun"),
price = c(NA, NA,100, NA, 50, NA, 200, NA)
)
I need to impute all missing dates between the available dates for each id's and then go symmetrically up and down to impute missing. Also, not always I need the average between two, eg: when I go 2 dates up and down and I see only 1 value, then I would impute that value.
Upvotes: 1
Views: 87
Reputation: 4652
Please find below with a reprex one possible solution using the data.table
library.
I built a function to make it easier to use.
Reprex
NA_imputations()
functionlibrary(data.table)
NA_imputations <- function(x) {
x[, rows := .I]
z <- x[, .I[!is.na(price)]]
id_1 <- z[-length(z)]
id_2 <- z[-1]
values <- x[z, .(price = price, id = id)]
values_1 <- values[-nrow(values)]
names(values_1) <- c("price_1", "id_o1")
values_2 <- values[-1]
names(values_2) <- c("price_2", "id_o2")
subtract <- z[-1] - z[-length(z)]
r <- data.table(id_1, values_1, id_2, values_2, subtract)
Results <- r[, `:=` (id_mean = fifelse(subtract > 2 & subtract %% 2 == 0, id_1+(subtract/2), (id_1+id_2)/2),
mean = fifelse(subtract >= 2 & subtract %% 2 == 0 & id_o1 == id_o2, (price_1+price_2)/2, NA_real_))
][, `:=` (price_1 = NULL, id_o1 = NULL, id_2 = NULL, price_2 = NULL, id_o2 = NULL)
][x, on = .(id_mean = rows)
][, price := fcoalesce(price, mean)
][, mean := NULL
][r[subtract > 2 & subtract %% 2 == 0,id_1]:r[subtract > 2 & subtract %% 2 == 0,id_mean-1], price := lapply(price, nafill, type = "nocb"), by = .(id)
][, price := nafill(price, type = "nocb"), by = .(id)
][, price := nafill(price, type = "locf")
][, `:=` (id_1 = NULL, id_mean = NULL, subtract = NULL)][]
return(Results)
}
NA_imputations()
functionNA_imputations(df1)
#> id Date price
#> <num> <char> <num>
#> 1: 11 1-Jun 100.0
#> 2: 11 18-Jun 100.0
#> 3: 11 3-Jul 100.0
#> 4: 11 4-Jul 62.5
#> 5: 11 25-Jul 25.0
#> 6: 11 3-Nov 37.5
#> 7: 11 7-Nov 50.0
#> 8: 11 28_Nov 50.0
#> 9: 555 1-Jun 400.0
#> 10: 555 18-Jun 400.0
#> 11: 555 3-Jul 400.0
#> 12: 555 4-Jul 400.0
#> 13: 555 25-Jul 300.0
#> 14: 555 3-Nov 200.0
#> 15: 555 7-Nov 200.0
#> 16: 555 28_Nov 200.0
#> 17: 555 30-Nov 200.0
Created on 2021-12-05 by the reprex package (v2.0.1)
Upvotes: 1