Mark Miller
Mark Miller

Reputation: 13123

arithmetic by row with missing observations

I was trying to perform an arithmetic operation on a data set by row and could not get the answer I wanted because of missing observations. When I used, for example, apply I obtained missing observations. Eventually I got the desired answer using rowSums by first multiplying some columns by -1 (instead of subtracting them). I also obtained the desired answer using a loop. These three approaches, and one other, are show below.

Is it possible to obtained the desired answer with apply? Is there a simpler way that I am overlooking? My question is similar to this one: Handling missing/incomplete data in R--is there function to mask but not remove NAs? One answer there suggested looking at the zoo package, although I really prefer base R. Thank you for any advice. This seems like a basic question. Sorry if I am overlooking an obvious solution. (I just noticed there is no tag for missing observations.)

df.1 <- read.table(text='
AA   BB   CC   DD   EE   FF   GG   HH   II   JJ
100   2    3    4    5    6    7    8    9   10
200  12   13   14   15   16   17   18   19   20
300  NA   23   24   25   26   NA   28   29   30
400  32   33   34   35   36   37   38   NA   40
500  42   43   44   45   46   47   48   49   50
', header=T, na.strings = "NA")

# this does not work

apply(df.1[,c("AA","BB", "CC", "DD", "FF", "GG", "II", "JJ")], 1, 
             function(x) {x[1] - (x[2] + x[3] + x[4]) - (x[5] + x[6] + x[7]) + x[8]})

# [1]  79 129  NA  NA 279

df.2 <- data.frame(df.1$AA, df.1$BB, df.1$CC, df.1$DD, df.1$FF, df.1$GG, df.1$II, df.1$JJ)
colnames(df.2) <- c('AA', 'BB', 'CC', 'DD', 'FF', 'GG', 'II', 'JJ')

# this does not work

df.2$AA - (df.2$BB + df.2$CC + df.2$DD) - (df.2$FF + df.2$GG + df.2$II) + df.2$JJ
# [1]  79 129  NA  NA 279

# this works

df.3 <- data.frame(df.2$AA, -1 * df.2$BB, -1 * df.2$CC, -1 * df.2$DD, -1 * df.2$FF, -1 * df.2$GG, -1 * df.2$II, df.2$JJ)

rowSums(df.3, na.rm = TRUE)
# [1]  79 129 228 268 279

# this works

my.sum <- rep(NA, nrow(df.2))

for (i in 1:nrow(df.2)) {
  my.sum[i] <- sum(df.2$AA[i], -1 * df.2$BB[i], -1 * df.2$CC[i], -1 * df.2$DD[i], -1 * df.2$FF[i], -1 * df.2$GG[i], -1 * df.2$II[i], df.2$JJ[i], na.rm = TRUE)
}

my.sum
# [1]  79 129 228 268 279

Upvotes: 0

Views: 128

Answers (2)

James
James

Reputation: 66874

You can just use sum in an apply call and weight each vector accordingly.

weights <- c(1,rep(-1,3),0,rep(-1,2),0,-1,1)

apply(df.1,1,function(x) sum(x*weights,na.rm=T))
[1]  79 129 228 268 279

Although it is perhaps quicker to use colSums on the transposed matrix multiplied by these weights:

colSums(t(df.1)*weights,na.rm=T)
[1]  79 129 228 268 279

Upvotes: 3

Theodore Lytras
Theodore Lytras

Reputation: 3963

How about replacing NAs with 0s on the fly, like:

apply(df.1[,c("AA","BB", "CC", "DD", "FF", "GG", "II", "JJ")], 1, 
             function(x) {x[is.na(x)]=0; x[1] - (x[2] + x[3] + x[4]) - (x[5] + x[6] + x[7]) + x[8]})

Upvotes: 2

Related Questions