Nick Knauer
Nick Knauer

Reputation: 4243

Nested Ifelse statement in same table

I have a dataset below:

Col1     Col2    Spend
   A        0      100
   A        0      100
   B        0      100
   C        0      100
   D        0      200

I want to write an ifelse statement stating, if the sum of Col2 is greater than 0, then set Col2 = Spend. If the sum of Col2 is not greater than 0, then only apply the Spend Column to the corresponding rows where the value does not equal A and leave the rest as Col2 original values.

I want my final output to look like this:

Col1     Col2    Spend
   A        0      100
   A        0      100
   B      100      100
   C      100      100
   D      200      200

I think it would be something like this:

df$Col2 <- ifelse(sum(df$Col2)>0, df$Spend, ifelse(df$Col1!="A", df$Spend, df$Col2))

My issue is that when I run this, I check the sum again for Col2 and it still doesn't change. Not sure what I am doing wrong here.

Upvotes: 0

Views: 49

Answers (2)

SebastianP
SebastianP

Reputation: 26

This problem has to do with how R handles vectorized operations, since the first argument to your ifelse statement returns a logical of length 1 (since sum(df$Col2) > 0 can only return True, False or NA), only one of the values is being used in the others.

Here would be an example of what is going on

> ifelse(TRUE, 1:4, 1:4)
[1] 1
> ifelse(c(TRUE, TRUE, FALSE, FALSE), 1:4, 1:4)
[1] 1 2 3 4

So your example would be fixed by changing the syntax as such

if (sum(df$Col2)>0) {
    df$Col2 <- df$Spend
} else {
    df$Col2 <- ifelse(df$Col1!="A", df$Spend, df$Col2)
}

And if you REALLY WANT a one-liner

df$Col2 <- ifelse(rep(sum(df$Col2)>0, nrow(df)), df$Spend, ifelse(df$Col1!="A", df$Spend, df$Col2))

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

The dataset

df = read.table(text = "
Col1     Col2    Spend
A        0      100
A        0      100
B        0      100
C        0      100
D        0      200
", header=T)

The problem

If you run your code you'll see that it returns one 0 value, which is then replicated to fit the length of your column. This doesn't return five 0s.

ifelse(sum(df$Col2)>0, df$Spend, ifelse(df$Col1!="A", df$Spend, df$Col2))

# [1] 0

This happens because you can see from ?ifelse that "ifelse returns a value with the same shape as test..." and your (test) sum(df$Col2)>0 will return only one value, becasue sum(df$Col2) is one value (i.e. the sum).

The solution

You can use an if ... else statement with a nested ifelse like this:

if (sum(df$Col2)>0) df$Spend else ifelse(df$Col1!="A", df$Spend, df$Col2)

# [1]   0   0 100 100 200

So, you check that if (your test) sum(df$Col2)>0 is true, then return the whole Spend column, otherwise continue to the ifelse statement.

Upvotes: 3

Related Questions