R: Cumulative sum function that is not applied to NAs

Question

I have a .csv file exported from my R script which looks like this and can be found under the following adress: https://dl.dropboxusercontent.com/u/16277659/Sample_cum.csv

NAME;   ID; REFERENCE_YEAR; VALUE1
SAMPLE1;    684;    1914;   0
SAMPLE1;    684;    1915;   -18
SAMPLE1;    684;    1927;   -6
SAMPLE1;    684;    1928;   -20
SAMPLE1;    684;    1929;   -12
SAMPLE1;    684;    1931;   -20
SAMPLE1;    684;    1933;   -6
SAMPLE1;    684;    1934;   -25
SAMPLE1;    684;    1935;   0
SAMPLE1;    684;    1936;   NA
SAMPLE1;    684;    1937;   -4
SAMPLE1;    684;    1938;   NA
SAMPLE1;    684;    1939;   NA
SAMPLE1;    684;    1945;   NA
SAMPLE1;    684;    1946;   NA
SAMPLE1;    684;    1957;   -5
SAMPLE1;    684;    1958;   -5
SAMPLE2;    2923;   1987;   0
SAMPLE2;    2923;   1988;   -2
SAMPLE2;    2923;   1989;   0
SAMPLE2;    2923;   1879;   -2900
SAMPLE2;    2923;   1892;   -5600
SAMPLE2;    2923;   1906;   -4300
SAMPLE2;    2923;   1929;   3500
SAMPLE2;    2923;   1987;   NA
SAMPLE2;    2923;   1992;   NA
SAMPLE2;    2923;   2007;   NA
SAMPLE2;    2923;   2008;   -17
SAMPLE3;    48;     1980;   0
SAMPLE3;    48;     1981;   2
SAMPLE3;    48;     1982;   1
SAMPLE3;    48;     1983;   1

What I am attempting to do and is causing me problems is to calculate the cumulative sum of the column VALUE1 based on the different (grouped) SAMPLES in NAME.

My code so far looks like this:

# read in data
dat <- read.csv2("https://dl.dropboxusercontent.com/u/16277659/Sample_cum.csv")

dat2 <- transform(dat, CUMVALUE1= ave(VALUE1, NAME, FUN= cumsum))

This results is the following dataframe:

NAME;   ID; REFERENCE_YEAR; VALUE1; CUMVALUE1
SAMPLE1;    684;    1914;    0;      0
SAMPLE1;    684;    1915;   -18;    -18
SAMPLE1;    684;    1927;   -6;     -24
SAMPLE1;    684;    1928;   -20;    -44
SAMPLE1;    684;    1929;   -12;    -56
SAMPLE1;    684;    1931;   -20;    -76
SAMPLE1;    684;    1933;   -6;     -82
SAMPLE1;    684;    1934;   -25;    -107
SAMPLE1;    684;    1935;   0;      -107
SAMPLE1;    684;    1936;   NA;     NA
SAMPLE1;    684;    1937;   -4;     NA
SAMPLE1;    684;    1938;   NA;     NA
SAMPLE1;    684;    1939;   NA;     NA
SAMPLE1;    684;    1945;   NA;     NA
SAMPLE1;    684;    1946;   NA;     NA
SAMPLE1;    684;    1957;   -5;     NA
SAMPLE1;    684;    1958;   -5;     NA
SAMPLE2;    2923;   1987;   0;      0
SAMPLE2;    2923;   1988;   -2;     -2
SAMPLE2;    2923;   1989;   0;      -2
SAMPLE2;    2923;   1879;   -2900;  -2902
SAMPLE2;    2923;   1892;   -5600;  -8502
SAMPLE2;    2923;   1906;   -4300;  -12802
SAMPLE2;    2923;   1929;   3500;   -9302
SAMPLE2;    2923;   1987;   NA;     NA
SAMPLE2;    2923;   1992;   NA;     NA
SAMPLE2;    2923;   2007;   NA;     NA
SAMPLE2;    2923;   2008;   -17;    NA
SAMPLE3;    48;     1980;   0;      0
SAMPLE3;    48;     1981;   2;      2
SAMPLE3;    48;     1982;   1;      3
SAMPLE3;    48;     1983;   1;      4

My problem with this is that the NA values should be left out when applying the cumsum function. E.g. this extract...

SAMPLE1;    684;    1934;   -25;    -107
SAMPLE1;    684;    1935;   0;      -107
SAMPLE1;    684;    1936;   NA;     NA
SAMPLE1;    684;    1937;   -4;     NA

should result in something like this

SAMPLE1;    684;    1934;   -25;    -107
SAMPLE1;    684;    1935;   0;      -107
SAMPLE1;    684;    1936;   NA;     NA
SAMPLE1;    684;    1937;   -4;     -111

How can I do that? Thanks for your help.

Stephan Kolassa · Accepted Answer

You can work with an anonymous function instead of cumsum, which transforms NA to zeros, calculates cumulative sums, transforms the cumsums corresponding to NA to NA and outputs the result:

dat2 <- transform(dat, CUMVALUE1= ave(VALUE1, NAME, FUN=function(xx){
  yy<-xx
  yy[is.na(xx)]<-0
  zz<-cumsum(yy)
  zz[is.na(xx)]<-NA
  zz}))

R: Cumulative sum function that is not applied to NAs

Answers (2)

Related Questions