Djpengo
Djpengo

Reputation: 388

Fill in missing values (nacof/nocb) in character column by group

Say that I have dataset like this:

   Object       date date_data
 1:      N       <NA>          
 2:      A       <NA>          
 3:      A       <NA>          
 4:      A 2020-01-01 something
 5:      B       <NA>          
 6:      B       <NA>          
 7:      B 2020-01-01 something
 8:      C       <NA>          
 9:      C       <NA>          
10:      C 2020-01-01 something

Generated by

example <- data.table(Object = rep(LETTERS[1:3], each=3), date = as.Date(rep(c(NA,NA,"2020-01-01"),3)), date_data = rep(c("","","something")))
example <- rbind(list(Object="N", date=as.Date(NA), date_data = ""), example)

I know that I can fill in the missing dates for each group using nafill:

example[, date:= nafill(date,"nocb"), by=Object]
    Object       date date_data
 1:      N       <NA>          
 2:      A 2020-01-01          
 3:      A 2020-01-01          
 4:      A 2020-01-01 something
 5:      B 2020-01-01          
 6:      B 2020-01-01          
 7:      B 2020-01-01 something
 8:      C 2020-01-01          
 9:      C 2020-01-01          
10:      C 2020-01-01 something

What I would like to do is to fill in the date_data column in the same way as the date column. I cannot do that with the datatable's nafill option as it doesn't work with character values.

What would be a simple workaround? I have seen similar questions, but didn't find any that refer to the same problem

Edit

@Ronak Shah's data.table works for the original example. But if I change it to

example <- data.table(Object = c(rep("A",6), rep("B",3)), date = as.Date(rep(c(NA,NA,"2020-01-01"),3)), date_data = rep(c("","","something")))
example <- rbind(list(Object="N", date=as.Date(NA), date_data = ""), example)


   Object       date date_data
 1:      N       <NA>          
 2:      A       <NA>          
 3:      A       <NA>          
 4:      A 2020-01-01 something
 5:      A       <NA>          
 6:      A       <NA>          
 7:      A 2020-01-01 something
 8:      B       <NA>          
 9:      B       <NA>          
10:      B 2020-01-01 something

na.fill does not longer work, I get the error message: Supplied 4 items to be assigned to group 2 of size 6 in column 'date_data'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

Upvotes: 2

Views: 2296

Answers (2)

chinsoon12
chinsoon12

Reputation: 25225

Another option using data.table applying to all types of columns at once:

#change empty string to NAs if there are more columns, use set
example[date_data=="", date_data := NA_character_]

cols <- c("date", "date_data")
example[, (cols) := lapply(.SD, 
      function(x) x[nafill(replace(seq.int(.N), is.na(x), NA_integer_), "nocb")]), 
  Object, .SDcols=cols]

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388817

You can replace empty values with NA and use zoo::na.locf.

library(data.table)

example[, date_data := zoo::na.locf(replace(date_data, date_data == "", NA)), Object]
example

#   Object       date date_data
# 1:      N       <NA>          
# 2:      A 2020-01-01 something
# 3:      A 2020-01-01 something
# 4:      A 2020-01-01 something
# 5:      B 2020-01-01 something
# 6:      B 2020-01-01 something
# 7:      B 2020-01-01 something
# 8:      C 2020-01-01 something
# 9:      C 2020-01-01 something
#10:      C 2020-01-01 something

and similarly using tidyr's fill :

library(dplyr)

example %>%
  mutate(date_data = replace(date_data, date_data == "", NA)) %>%
  group_by(Object) %>%
  tidyr::fill(date_data, .direction = "up")

Upvotes: 3

Related Questions