Reputation: 3427
I have a data frame of leases with start/end years like this
region=c("a","b","c","d")
lease=c("x","y","z","k")
startyr=c(2000,2001,2003,2002)
endyr=c(2004,2004,2006,2005)
annualAmt=c(7000,8500,6000,5500)
df=data.frame(region,lease,startyr,endyr,annualAmt)
I want to spread out the years by reshaping the data frame into this desired output:
region lease 2000 2001 2002 ... 2006
a x 7000 7000 7000 7000 7000 0 0
b y 0 8500 8500 8500 8500 0 0
The logic is that if a lease covers year 2000-2004, its amt will count into the 2000,2001..2004 col
What's the best way to do it? If I write a loop, how should iI name the newly created year cols 2000-2006? Or should I use apply?
Upvotes: 1
Views: 282
Reputation: 193517
Here's an alternative that mostly involves basic addition and subtraction:
Rows <- df$endyr - df$startyr # How many times to repeat rows?
df <- df[rep(rownames(df), Rows), ] # Repeat the rows
df$year <- df$startyr + sequence(Rows) - 1 # Add a new "year" variable
reshape(df, direction = "wide", # Reshape, long to wide
idvar = c("region", "lease"), # idvars are the first two cols
timevar = "year", # timevar is the new year col
drop = c("startyr", "endyr")) # and drop the start/endyr cols
# region lease annualAmt.2000 annualAmt.2001 annualAmt.2002
# 1 a x 7000 7000 7000
# 2 b y NA 8500 8500
# 3 c z NA NA NA
# 4 d k NA NA 5500
# annualAmt.2003 annualAmt.2004 annualAmt.2005
# 1 7000 NA NA
# 2 8500 NA NA
# 3 6000 6000 6000
# 4 5500 5500 NA
Alternatively, you can use "data.table", like this:
library(data.table)
## Start with your original df
dt <- data.table(df)
dcast.data.table(
DT[, list(year = seq(startyr, endyr),
annualAmt),
by = list(region, lease)],
region + lease ~ year,
value.var = "annualAmt", fill = 0)
# region lease 2000 2001 2002 2003 2004 2005 2006
# 1: a x 7000 7000 7000 7000 7000 0 0
# 2: b y 0 8500 8500 8500 8500 0 0
# 3: c z 0 0 0 6000 6000 6000 6000
# 4: d k 0 0 5500 5500 5500 5500 0
Upvotes: 1
Reputation: 206197
How about
years <- seq(min(df$startyr), max(df$endyr))
dd <- data.frame(region, lease, t(mapply(function(a,b, v) {
v* !is.na(match(years, seq(a, b)))
}, startyr, endyr, annualAmt)))
names(dd)[-(1:2)]<-years
dd
which returns
region lease 2000 2001 2002 2003 2004 2005 2006
1 a x 7000 7000 7000 7000 7000 0 0
2 b y 0 8500 8500 8500 8500 0 0
3 c z 0 0 0 6000 6000 6000 6000
4 d k 0 0 5500 5500 5500 5500 0
Upvotes: 0