Reputation: 808
I have a dataset like this:
id date value
1 8/06/12 1
1 8/08/12 1
2 8/07/12 2
2 8/08/12 1
Every id should a have a value for every date. When an id is missing a particular date, that row needs to be added with a value of 0. E.g.,
id date value
1 8/06/12 1
1 8/07/12 0
1 8/08/12 1
2 8/06/12 0
2 8/07/12 2
2 8/08/12 1
I'm trying to figure out how to add the rows with 0s. There's a good solution here: R - Fill missing dates by group. However, I can't use the tidyr::complete
function because I'm using sparklyr
and, as far as I know, need to stay within dplyr
functions.
Upvotes: 5
Views: 411
Reputation: 35554
expand.grid()
Use expand.grid()
to create all combinations of id
and date
. By the way, notice to transform your date to the class Date
by as.Date()
otherwise it will be a meaningless string.
df %>% mutate(date = as.Date(date, "%m/%d/%y")) %>%
right_join(expand.grid(id = unique(.$id), date = unique(.$date))) %>%
mutate(value = coalesce(value, 0L)) %>%
arrange(id, date)
# id date value
# 1 1 2012-08-06 1
# 2 1 2012-08-07 0
# 3 1 2012-08-08 1
# 4 2 2012-08-06 0
# 5 2 2012-08-07 2
# 6 2 2012-08-08 1
Reproducible Data
df <- structure(list(id = c(1L, 1L, 2L, 2L), date = c("8/06/12", "8/08/12",
"8/07/12", "8/08/12"), value = c(1L, 1L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 206
In sparklyr
, you must use Spark functions. This is a job for coalesce
. First you have to fill out all the pairs of ids and dates you expect to see, so maybe something like:
(edit)
all_id <- old_data %>% distinct(id) %>% mutate(common=0)
all_date <- old_data %>% distinct(date) %>% mutate(common=0)
all_both <- all_id %>% full_join(all_date,by='common')
data <- old_data %>%
right_join(all_both %>% select(-common),by=c('id','date')) %>%
mutate(value=`coalesce(value,0)`)
I have assumed you have all the dates and ids you care about in your old data, though that might not be the case.
Upvotes: 3