Reputation: 13123
I was trying to perform an arithmetic operation on a data set by row and could not get the answer I wanted because of missing observations. When I used, for example, apply
I obtained missing observations. Eventually I got the desired answer using rowSums
by first multiplying some columns by -1 (instead of subtracting them). I also obtained the desired answer using a loop. These three approaches, and one other, are show below.
Is it possible to obtained the desired answer with apply
? Is there a simpler way that I am overlooking? My question is similar to this one: Handling missing/incomplete data in R--is there function to mask but not remove NAs? One answer there suggested looking at the zoo package, although I really prefer base R. Thank you for any advice. This seems like a basic question. Sorry if I am overlooking an obvious solution. (I just noticed there is no tag for missing observations.)
df.1 <- read.table(text='
AA BB CC DD EE FF GG HH II JJ
100 2 3 4 5 6 7 8 9 10
200 12 13 14 15 16 17 18 19 20
300 NA 23 24 25 26 NA 28 29 30
400 32 33 34 35 36 37 38 NA 40
500 42 43 44 45 46 47 48 49 50
', header=T, na.strings = "NA")
# this does not work
apply(df.1[,c("AA","BB", "CC", "DD", "FF", "GG", "II", "JJ")], 1,
function(x) {x[1] - (x[2] + x[3] + x[4]) - (x[5] + x[6] + x[7]) + x[8]})
# [1] 79 129 NA NA 279
df.2 <- data.frame(df.1$AA, df.1$BB, df.1$CC, df.1$DD, df.1$FF, df.1$GG, df.1$II, df.1$JJ)
colnames(df.2) <- c('AA', 'BB', 'CC', 'DD', 'FF', 'GG', 'II', 'JJ')
# this does not work
df.2$AA - (df.2$BB + df.2$CC + df.2$DD) - (df.2$FF + df.2$GG + df.2$II) + df.2$JJ
# [1] 79 129 NA NA 279
# this works
df.3 <- data.frame(df.2$AA, -1 * df.2$BB, -1 * df.2$CC, -1 * df.2$DD, -1 * df.2$FF, -1 * df.2$GG, -1 * df.2$II, df.2$JJ)
rowSums(df.3, na.rm = TRUE)
# [1] 79 129 228 268 279
# this works
my.sum <- rep(NA, nrow(df.2))
for (i in 1:nrow(df.2)) {
my.sum[i] <- sum(df.2$AA[i], -1 * df.2$BB[i], -1 * df.2$CC[i], -1 * df.2$DD[i], -1 * df.2$FF[i], -1 * df.2$GG[i], -1 * df.2$II[i], df.2$JJ[i], na.rm = TRUE)
}
my.sum
# [1] 79 129 228 268 279
Upvotes: 0
Views: 128
Reputation: 66874
You can just use sum
in an apply
call and weight each vector accordingly.
weights <- c(1,rep(-1,3),0,rep(-1,2),0,-1,1)
apply(df.1,1,function(x) sum(x*weights,na.rm=T))
[1] 79 129 228 268 279
Although it is perhaps quicker to use colSums
on the transposed matrix multiplied by these weights:
colSums(t(df.1)*weights,na.rm=T)
[1] 79 129 228 268 279
Upvotes: 3
Reputation: 3963
How about replacing NAs with 0s on the fly, like:
apply(df.1[,c("AA","BB", "CC", "DD", "FF", "GG", "II", "JJ")], 1,
function(x) {x[is.na(x)]=0; x[1] - (x[2] + x[3] + x[4]) - (x[5] + x[6] + x[7]) + x[8]})
Upvotes: 2