Reputation: 51
When applying sum()
with conditions in summarize()
function, it does not provide the correct answer.
Make a data frame x:
x = data.frame(flag = 1, uin = 1, val = 2)
x = rbind(x, data.frame(flag = 2, uin = 2, val = 3))
This is what x looks like:
flag uin val
1 1 1 2
2 2 2 3
I want to sum up the val
and the val
with flag == 2
, so I write
x %>% summarize(val = sum(val), val.2 = sum(val[flag == 2]))
and the result is:
val val.2
1 5 NA
But what I expect is that val.2
is 3 instead of NA
. For more information, if I calculate the conditional summation first then the total summation, it comes out with the correct answer:
x %>% summarize(val.2 = sum(val[flag == 2]), val = sum(val))
val.2 val
1 3 5
Moreover, if I only calculate the conditional summation, it works fine too:
x %>% summarize(val.2 = sum(val[flag == 2]))
val.2
1 3
Upvotes: 2
Views: 731
Reputation: 60472
Duplicate names are causing you problems. In this code
x %>% summarize(val = sum(val), val.2 = sum(val[flag == 2]))
You have two val
objects. One created from val = sum(val)
and other from the data frame x
. In your code, you change val
from the data frame value to val=sum(val) = 5
. Then you do
`val[flag == 2]`
which gives a vector c(2, NA)
, since val = 5
. Hence, when you add 2 + NA
you get NA
. The solution, don't use val
twice,
x %>% summarize(val_sum = sum(val), val.2 = sum(val[flag == 2]))
Upvotes: 4