sum() with conditions provides incorrect result in dplyr package

Question

When applying sum() with conditions in summarize() function, it does not provide the correct answer.

Make a data frame x:

x = data.frame(flag = 1, uin = 1, val = 2)
x = rbind(x, data.frame(flag = 2, uin = 2, val = 3))

This is what x looks like:

  flag uin val
1    1   1   2
2    2   2   3

I want to sum up the val and the val with flag == 2, so I write

x %>% summarize(val = sum(val), val.2 = sum(val[flag == 2]))

and the result is:

  val val.2
1   5    NA

But what I expect is that val.2 is 3 instead of NA. For more information, if I calculate the conditional summation first then the total summation, it comes out with the correct answer:

x %>% summarize(val.2 = sum(val[flag == 2]), val = sum(val))
  val.2 val
1     3   5

Moreover, if I only calculate the conditional summation, it works fine too:

x %>% summarize(val.2 = sum(val[flag == 2]))
  val.2
1     3

sum() with conditions provides incorrect result in dplyr package

Answers (1)

Related Questions