Reputation: 4243
I have a dataset below:
Col1 Col2 Spend
A 0 100
A 0 100
B 0 100
C 0 100
D 0 200
I want to write an ifelse statement stating, if the sum of Col2 is greater than 0, then set Col2 = Spend. If the sum of Col2 is not greater than 0, then only apply the Spend Column to the corresponding rows where the value does not equal A
and leave the rest as Col2 original values.
I want my final output to look like this:
Col1 Col2 Spend
A 0 100
A 0 100
B 100 100
C 100 100
D 200 200
I think it would be something like this:
df$Col2 <- ifelse(sum(df$Col2)>0, df$Spend, ifelse(df$Col1!="A", df$Spend, df$Col2))
My issue is that when I run this, I check the sum again for Col2 and it still doesn't change. Not sure what I am doing wrong here.
Upvotes: 0
Views: 49
Reputation: 26
This problem has to do with how R handles vectorized operations, since the first argument to your ifelse
statement returns a logical of length 1 (since sum(df$Col2) > 0
can only return True
, False
or NA
), only one of the values is being used in the others.
Here would be an example of what is going on
> ifelse(TRUE, 1:4, 1:4)
[1] 1
> ifelse(c(TRUE, TRUE, FALSE, FALSE), 1:4, 1:4)
[1] 1 2 3 4
So your example would be fixed by changing the syntax as such
if (sum(df$Col2)>0) {
df$Col2 <- df$Spend
} else {
df$Col2 <- ifelse(df$Col1!="A", df$Spend, df$Col2)
}
And if you REALLY WANT a one-liner
df$Col2 <- ifelse(rep(sum(df$Col2)>0, nrow(df)), df$Spend, ifelse(df$Col1!="A", df$Spend, df$Col2))
Upvotes: 1
Reputation: 16121
The dataset
df = read.table(text = "
Col1 Col2 Spend
A 0 100
A 0 100
B 0 100
C 0 100
D 0 200
", header=T)
The problem
If you run your code you'll see that it returns one 0
value, which is then replicated to fit the length of your column. This doesn't return five 0
s.
ifelse(sum(df$Col2)>0, df$Spend, ifelse(df$Col1!="A", df$Spend, df$Col2))
# [1] 0
This happens because you can see from ?ifelse
that "ifelse returns a value with the same shape as test..." and your (test) sum(df$Col2)>0
will return only one value, becasue sum(df$Col2)
is one value (i.e. the sum).
The solution
You can use an if ... else
statement with a nested ifelse
like this:
if (sum(df$Col2)>0) df$Spend else ifelse(df$Col1!="A", df$Spend, df$Col2)
# [1] 0 0 100 100 200
So, you check that if (your test) sum(df$Col2)>0
is true, then return the whole Spend
column, otherwise continue to the ifelse
statement.
Upvotes: 3