Rijin
Rijin

Reputation: 91

how to groupby and take mean of value by symetrically looping forward and backward on the date value in r

I need to group data on ID and then replace the missing value of price by iterating on a date value up and down. first, look for 1 date value up and down if no data go 2 date values up and down until there is a mean value for all rows.

Data Format

df1 <- data.frame(id = c(11,11,11,11,11,11,11,11,555,555,555,555,555,555,555,555,555),
                  Date = c("1-Jun", "18-Jun", "3-Jul", "4-Jul", "25-Jul", "3-Nov", "7-Nov", "28_Nov",
                           "1-Jun", "18-Jun", "3-Jul", "4-Jul", "25-Jul", "3-Nov", "7-Nov", "28_Nov",
                           "30-Nov"),
                  price = c(NA, NA, 100, NA, 25, NA, 50, NA, 400, NA, NA, NA, NA, NA, NA, NA, 200)
                  )

Updated requirement:

df1 <- data.frame(id = c(11,11,11,11,11,11,11,11),
                  Date = c("1-Jun", "5-Jun", "8-Jun", "9-Jun", "14-Jun", "16-Jun", "20-Jun", "21-Jun"),
                  price = c(NA, NA,100, NA, 50, NA, 200, NA)
)

I need to impute all missing dates between the available dates for each id's and then go symmetrically up and down to impute missing. Also, not always I need the average between two, eg: when I go 2 dates up and down and I see only 1 value, then I would impute that value.

enter image description here

Upvotes: 1

Views: 87

Answers (1)

lovalery
lovalery

Reputation: 4652

Please find below with a reprex one possible solution using the data.table library.

I built a function to make it easier to use.

Reprex

  • Code of the NA_imputations() function
library(data.table)

NA_imputations <- function(x) {
  
  x[, rows := .I]
  
  z <- x[, .I[!is.na(price)]]
  
  id_1 <- z[-length(z)]
  id_2 <- z[-1]
  
  values <- x[z, .(price = price, id = id)]
  values_1 <- values[-nrow(values)]
  names(values_1) <- c("price_1", "id_o1")
  values_2 <- values[-1]
  names(values_2) <- c("price_2", "id_o2")
  
  subtract <- z[-1] - z[-length(z)]
  
  r <- data.table(id_1, values_1, id_2, values_2, subtract)
  
  Results <- r[, `:=` (id_mean = fifelse(subtract > 2 & subtract %% 2 == 0, id_1+(subtract/2), (id_1+id_2)/2),
                       mean = fifelse(subtract >= 2 & subtract %% 2 == 0 & id_o1 == id_o2, (price_1+price_2)/2, NA_real_))
               ][, `:=` (price_1 = NULL, id_o1 = NULL, id_2 = NULL, price_2 = NULL, id_o2 = NULL)
                 ][x, on = .(id_mean = rows)
                   ][, price := fcoalesce(price, mean)
                     ][, mean := NULL
                       ][r[subtract > 2 & subtract %% 2 == 0,id_1]:r[subtract > 2 & subtract %% 2 == 0,id_mean-1], price := lapply(price, nafill, type = "nocb"), by = .(id)
                         ][, price := nafill(price, type = "nocb"), by = .(id)
                           ][, price := nafill(price, type = "locf")
                             ][, `:=` (id_1 = NULL, id_mean = NULL, subtract = NULL)][]
  
  return(Results)
}
  • Output of the NA_imputations() function
NA_imputations(df1)
#>        id   Date price
#>     <num> <char> <num>
#>  1:    11  1-Jun 100.0
#>  2:    11 18-Jun 100.0
#>  3:    11  3-Jul 100.0
#>  4:    11  4-Jul  62.5
#>  5:    11 25-Jul  25.0
#>  6:    11  3-Nov  37.5
#>  7:    11  7-Nov  50.0
#>  8:    11 28_Nov  50.0
#>  9:   555  1-Jun 400.0
#> 10:   555 18-Jun 400.0
#> 11:   555  3-Jul 400.0
#> 12:   555  4-Jul 400.0
#> 13:   555 25-Jul 300.0
#> 14:   555  3-Nov 200.0
#> 15:   555  7-Nov 200.0
#> 16:   555 28_Nov 200.0
#> 17:   555 30-Nov 200.0

Created on 2021-12-05 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions