Reputation: 91
My sample data looks like this:
data <- read.table(text="group; year; val
a; 1928; 20
a; 1929; 50
a; 1930; 40
a; 1931; 45
b; 1935; -10
b; 1936; -15 ", sep=";", header=T, stringsAsFactors = FALSE)
> data
group year val
1 a 1928 20
2 a 1929 50
3 a 1930 40
4 a 1931 45
5 b 1935 -10
6 b 1936 -15
What I would like to do is to calculate the cumulative sum relative to 1930 in a new column sum_rel
(e.g. 1930 is the start year, all values over 1930 should be added and under 1930 should be subtracted). If all years are bigger than 1930 the relative value (0=) should be the lowest year per group (as in case b).
group year val sum_rel
a 1927 -110
a 1928 20 -90
a 1929 50 -40
a 1930 40 0
a 1931 45 45
b 1934 0
b 1935 -10 -10
b 1936 -15 -25
I had a look at the cumsum
function but couldn't figure out to apply it over groups and I would be very glad if you could help me.
Upvotes: 1
Views: 608
Reputation: 206197
Adding the extra row is probably the trickiest part. This seems to produce the output you are after
do.call("rbind", unname(lapply(split(data, data$group), function(x) {
x<-x[order(x$year),]
cx <- c(which(x$year==1930),0)[1]+1
cs <- cumsum(c(0, x$val))
cbind(rbind(transform(x[1,], val=NA, year=min(x$year)-1), x), sum_rel=cs-cs[cx])
})))
# group year val sum_rel
# 1 a 1927 NA -110
# 2 a 1928 20 -90
# 3 a 1929 50 -40
# 4 a 1930 40 0
# 5 a 1931 45 45
# 52 b 1934 NA 0
# 51 b 1935 -10 -10
# 6 b 1936 -15 -25
Upvotes: 1