Reputation: 854
I have data that looks like this:
df <- read.table(tc <- textConnection("
var1 var2 var3 var4
1 1 7 NA
4 4 NA 6
2 NA 3 NA
4 4 4 4
1 3 1 1"), header = TRUE); close(tc)
I'm trying to create a new column that returns 1 if there's a match or 0 if none.
My non-working code looks like this:
df$var5 = ifelse("1" %in% df$var1,1,
ifelse("1" %in% df$var2,1,
ifelse("1" %in% df$var3,1,
ifelse("1" %in% df$var4,1,0))))
giving me a table:
var1 var2 var3 var4 var5
1 1 7 NA 1
4 4 NA 6 1
2 NA 3 NA 1
4 4 4 4 1
1 3 1 1 1
The table I actually want should look like
var1 var2 var3 var4 var5
1 1 7 NA 1
4 4 NA 6 0
2 NA 3 NA 0
4 4 4 4 0
1 3 1 1 1
I've looked at the posts:
ifelse not working as expected in R
and
Loop over rows of dataframe applying function with if-statement
but I couldn't get any answer to my problem.
Upvotes: 3
Views: 319
Reputation: 887148
The correct way should be
with(df, ifelse(var1 %in% 1,1,
ifelse(var2 %in% 1,1,
ifelse(var3 %in% 1,1,
ifelse(var4 %in% 1,1,0)))))
#[1] 1 0 0 0 1
The reason is that 1 %in% df1$var1
returns only a single element that 1.
1 %in% df$var1
#[1] TRUE
likewise, in all all the columns, there is 1, so it will return TRUE for all the ifelse
, resulting in value 1.
whereas the opposite
df$var1 %in% 1
#[1] TRUE FALSE FALSE FALSE TRUE
returns the logical vector with the same length
as the original column. In essence, by using %in%
, the length returned will be based on the length
of the object in the lhs
of %in%
It is not required to have ifelse
, a better option would be, using rowSum
on the logical matrix (df ==1
), and check whether it is not equal to 0, convert to binary with as.integer
.
as.integer(rowSums(df == 1, na.rm =TRUE)!=0)
#[1] 1 0 0 0 1
Or another option is Reduce
with |
as.integer(Reduce(`|`, lapply(replace(df, is.na(df), 0), `==`, 1)))
#[1] 1 0 0 0 1
Upvotes: 2
Reputation: 388982
Instead of using ifelse
separately for every column you can check row wise if 1 exists in the entire row and then return 1 or 0 accordingly
as.numeric(apply(df, 1, function(x) any(x == 1)) %in% TRUE)
#[1] 1 0 0 0 1
Just to explain the steps better:
apply(df, 1, function(x) any(x == 1))
#[1] TRUE NA NA FALSE TRUE
apply(df, 1, function(x) any(x == 1)) %in% TRUE
#[1] TRUE FALSE FALSE FALSE TRUE
as.numeric(apply(df, 1, function(x) any(x == 1)) %in% TRUE)
#[1] 1 0 0 0 1
Upvotes: 1