wjang4
wjang4

Reputation: 127

Vectorized Calculation in R

I was doing some calculation in R and was confused by the logic R uses.

For example,

table <- data.frame(a = c(1,NA,2,1), b= c(1,1,3,2))

Here, I am going to create the third column "c"

Column c will be 0 if column a contains NA. Otherwise it will be addition of column a and column b.

So the column c should be

c(2,0,5,3)

I wrote:

table$c <- 0
table$c[!is.na(table$a)] <- table$a + table$b

And I have column c as

c(2,0,NA,5)

I see that

table$c[3] = table$a[2]+table$b[2]

when I wanted it to be table$c[3] = table$a[3] + table$b[3].

I thought R would skip index number 2 in the left and right side and jump to index 3 in the calculation, but in fact, R skipped index number 2 in the left but didn't skip number 2 in the right side...

Why does this happen? How should I prevent this?

Thank you.

Upvotes: 0

Views: 342

Answers (2)

amonk
amonk

Reputation: 1795

Alternatively, you could make use of the data.table package

library(data.table)   
table <- data.table(a = c(1,NA,2,1), b= c(1,1,3,2))#creates the data table structure
table[,c:=ifelse(is.na(a),0,a+b)]#creates the column c based on the condition

> table
    a b c
1:  1 1 2
2: NA 1 0
3:  2 3 5
4:  1 2 3

Upvotes: 0

kintany
kintany

Reputation: 541

Use

table$c <- apply(table, 1, sum)
table$c[is.na(table$c)] <- 0

Or even more simple if you only start learning R:

table$c <- table$a + table$b
table$c[is.na(table$c)] <- 0

In order to prevent things like in your case, don't ask R to do two things at the same time like here:

table$c[!is.na(table$a)] <- table$a + table$b

You basically asked R to check if c contains NA 'on the fly', and it's not how R is working.

Upvotes: 2

Related Questions