Reputation: 2403
Apologies for the vague title, but part of the issue is that I don't know the jargon to describe what I'm trying to do here.
I have data that looks like this:
STATE COUNT DATE
LA 20 20160401
ME 83 20160601
MD 4 20160701
And what I want to do is be able to pick a date (let's say 20160301) and manipulate my data into the following:
STATE COUNT DATE NEWER_THAN
LA 20 20160401 20160301
LA 20 20160401 20160401
ME 83 20160601 20160301
ME 83 20160601 20160401
ME 83 20160601 20160501
ME 83 20160601 20160601
MD 4 20160701 20160301
MD 4 20160701 20160401
MD 4 20160701 20160501
MD 4 20160701 20160601
MD 4 20160701 20160701
So LA had a date of 20160401 in the first table, so it ends up with 2 rows in the second table since March and April are on or after the date I picked of 20160301.
Likewise, ME had a date of 201601 in the first table, so it ends up with 4 rows in the second table since March, April, May, and June are on or after the date I picked of 20160301.
And my question is whether or not anyone can get me started on how I might achieve this type of data manipulation in R.
Upvotes: 0
Views: 64
Reputation: 560
If you convert your "DATE" field into an actual Date object, you can get the difference between 2 Dates by substracting them. And from that difference you know, how many times you have to repeat each row. Following your eaxmple:
dataset <- data.frame(STATE = c("LA", "ME", "MD"),
COUNT = c(20, 83, 4),
DATE = c("20160401", "20160601", "20160701"),
stringsAsFactors = F)
format <- "%Y%m%d"
dataset$DATE <- strptime(dataset$DATE, format)
newDate <- strptime("20160301", format)
n.times <- ceiling(difftime(dataset$DATE, newDate, units = "days")/30)
dataset[rep(seq_len(nrow(dataset)), n.times), ]
Will result in the following:
STATE COUNT DATE
LA 20 2016-04-01
LA 20 2016-04-01
ME 83 2016-06-01
ME 83 2016-06-01
ME 83 2016-06-01
ME 83 2016-06-01
MD 4 2016-07-01
MD 4 2016-07-01
MD 4 2016-07-01
MD 4 2016-07-01
MD 4 2016-07-01
Upvotes: 1