user3791234
user3791234

Reputation: 141

Add rows to a data frame based on date in previous row

I have a data frame precip_range:

start_date<-as.Date(c("2010-4-01", "2010-4-02", "2010-04-04", "2010-07-02", "2010-07-02", "2010-07-03"))  
end_date<-as.Date(c("2010-7-01", "2010-07-01", "2010-07-02", "2010-10-03", "2010-10-04", "2010-10-03"))
date_category<-(c("A", "A", "A", "B", "B", "B"))
site <-c("Site 1", "Site 2", "Site 3", "Site 1", "Site 2", "Site 3")
precip_range<-data.frame(site, start_date, end_date, date_category)
precip_range$days <-(end_date-start_date)

I would like to add a column Date and add rows with values of Date that fill in the date gaps between start_date and end_dateby site. All columns except Date should retain the same information as in precip_range. I want the first few rows of the resulting data frame to look similar to the data frame result_example:

date<-as.Date(c("2010-04-01", "2010-04-02", "2010-04-03", "2010-04-04", "2010-04-05", "2010-04-06"))
result_date_category <-c("A", "A", "A", "A", "A", "A")
result_site <-c("Site 1", "Site 1", "Site 1", "Site 1", "Site 1", "Site 1")
result_start_date <-as.Date(c("2010-04-01", "2010-04-01", "2010-04-01", "2010-04-01", "2010-04-01","2010-04-01"))
result_end_date <-as.Date(c("2010-07-01", "2010-07-01", "2010-07-01", "2010-07-01", "2010-07-01","2010-07-01"))
result_example <-data.frame(date, result_site, result_start_date, result_end_date, result_date_category)
result_example$days <-(result_end_date-result_start_date)

My question is similar to [In R: Add rows with data of previous row to data frame, but I have not be able to successfully adapt that answer for my case. Thank you.

Upvotes: 1

Views: 1111

Answers (1)

David Arenburg
David Arenburg

Reputation: 92300

Try the following

diffs <- with(precip_range, end_date - start_date + 1)
result_site <- precip_range[rep(seq_len(nrow(precip_range)), diffs), ]

library(data.table)
setDT(result_site)[, Date := seq.int(start_date[1], end_date[1], by = "day"), 
                     by = list(site, date_category)]
result_site

#        site start_date   end_date date_category    days       Date
#   1: Site 1 2010-04-01 2010-07-01             A 91 days 2010-04-01
#   2: Site 1 2010-04-01 2010-07-01             A 91 days 2010-04-02
#   3: Site 1 2010-04-01 2010-07-01             A 91 days 2010-04-03
#   4: Site 1 2010-04-01 2010-07-01             A 91 days 2010-04-04
#   5: Site 1 2010-04-01 2010-07-01             A 91 days 2010-04-05
# ---                                                              
# 551: Site 3 2010-07-03 2010-10-03             B 92 days 2010-09-29
# 552: Site 3 2010-07-03 2010-10-03             B 92 days 2010-09-30
# 553: Site 3 2010-07-03 2010-10-03             B 92 days 2010-10-01
# 554: Site 3 2010-07-03 2010-10-03             B 92 days 2010-10-02
# 555: Site 3 2010-07-03 2010-10-03             B 92 days 2010-10-03

Here we computed the date differences and performed row indexing according to the size of each difference. Afterwards, we added the date differences using data.table package (in order to improve performance)

Upvotes: 3

Related Questions