Reputation: 1621
I have a .csv file exported from my R script which looks like this and can be found under the following adress: https://dl.dropboxusercontent.com/u/16277659/Sample_cum.csv
NAME; ID; REFERENCE_YEAR; VALUE1
SAMPLE1; 684; 1914; 0
SAMPLE1; 684; 1915; -18
SAMPLE1; 684; 1927; -6
SAMPLE1; 684; 1928; -20
SAMPLE1; 684; 1929; -12
SAMPLE1; 684; 1931; -20
SAMPLE1; 684; 1933; -6
SAMPLE1; 684; 1934; -25
SAMPLE1; 684; 1935; 0
SAMPLE1; 684; 1936; NA
SAMPLE1; 684; 1937; -4
SAMPLE1; 684; 1938; NA
SAMPLE1; 684; 1939; NA
SAMPLE1; 684; 1945; NA
SAMPLE1; 684; 1946; NA
SAMPLE1; 684; 1957; -5
SAMPLE1; 684; 1958; -5
SAMPLE2; 2923; 1987; 0
SAMPLE2; 2923; 1988; -2
SAMPLE2; 2923; 1989; 0
SAMPLE2; 2923; 1879; -2900
SAMPLE2; 2923; 1892; -5600
SAMPLE2; 2923; 1906; -4300
SAMPLE2; 2923; 1929; 3500
SAMPLE2; 2923; 1987; NA
SAMPLE2; 2923; 1992; NA
SAMPLE2; 2923; 2007; NA
SAMPLE2; 2923; 2008; -17
SAMPLE3; 48; 1980; 0
SAMPLE3; 48; 1981; 2
SAMPLE3; 48; 1982; 1
SAMPLE3; 48; 1983; 1
What I am attempting to do and is causing me problems is to calculate the cumulative sum of the column VALUE1 based on the different (grouped) SAMPLES in NAME.
My code so far looks like this:
# read in data
dat <- read.csv2("https://dl.dropboxusercontent.com/u/16277659/Sample_cum.csv")
dat2 <- transform(dat, CUMVALUE1= ave(VALUE1, NAME, FUN= cumsum))
This results is the following dataframe:
NAME; ID; REFERENCE_YEAR; VALUE1; CUMVALUE1
SAMPLE1; 684; 1914; 0; 0
SAMPLE1; 684; 1915; -18; -18
SAMPLE1; 684; 1927; -6; -24
SAMPLE1; 684; 1928; -20; -44
SAMPLE1; 684; 1929; -12; -56
SAMPLE1; 684; 1931; -20; -76
SAMPLE1; 684; 1933; -6; -82
SAMPLE1; 684; 1934; -25; -107
SAMPLE1; 684; 1935; 0; -107
SAMPLE1; 684; 1936; NA; NA
SAMPLE1; 684; 1937; -4; NA
SAMPLE1; 684; 1938; NA; NA
SAMPLE1; 684; 1939; NA; NA
SAMPLE1; 684; 1945; NA; NA
SAMPLE1; 684; 1946; NA; NA
SAMPLE1; 684; 1957; -5; NA
SAMPLE1; 684; 1958; -5; NA
SAMPLE2; 2923; 1987; 0; 0
SAMPLE2; 2923; 1988; -2; -2
SAMPLE2; 2923; 1989; 0; -2
SAMPLE2; 2923; 1879; -2900; -2902
SAMPLE2; 2923; 1892; -5600; -8502
SAMPLE2; 2923; 1906; -4300; -12802
SAMPLE2; 2923; 1929; 3500; -9302
SAMPLE2; 2923; 1987; NA; NA
SAMPLE2; 2923; 1992; NA; NA
SAMPLE2; 2923; 2007; NA; NA
SAMPLE2; 2923; 2008; -17; NA
SAMPLE3; 48; 1980; 0; 0
SAMPLE3; 48; 1981; 2; 2
SAMPLE3; 48; 1982; 1; 3
SAMPLE3; 48; 1983; 1; 4
My problem with this is that the NA values should be left out when applying the cumsum function. E.g. this extract...
SAMPLE1; 684; 1934; -25; -107
SAMPLE1; 684; 1935; 0; -107
SAMPLE1; 684; 1936; NA; NA
SAMPLE1; 684; 1937; -4; NA
should result in something like this
SAMPLE1; 684; 1934; -25; -107
SAMPLE1; 684; 1935; 0; -107
SAMPLE1; 684; 1936; NA; NA
SAMPLE1; 684; 1937; -4; -111
How can I do that? Thanks for your help.
Upvotes: 0
Views: 220
Reputation: 8267
You can work with an anonymous function instead of cumsum
, which transforms NA
to zeros, calculates cumulative sums, transforms the cumsums corresponding to NA
to NA
and outputs the result:
dat2 <- transform(dat, CUMVALUE1= ave(VALUE1, NAME, FUN=function(xx){
yy<-xx
yy[is.na(xx)]<-0
zz<-cumsum(yy)
zz[is.na(xx)]<-NA
zz}))
Upvotes: 1
Reputation: 132576
dat2 <- transform(dat, CUMVALUE1= ave(VALUE1, NAME, FUN= function(x) {
x[!is.na(x)] <- cumsum(x[!is.na(x)])
x}))
Upvotes: 1