Reputation: 49
I have a table with dates in column 1 (Column titled "Date") and values in columns 2 through 4 (Columns titled "A"->"C").
Date A B C
1/1/16 X X Y
1/2/16 X Y Y
1/3/16 X Y Z
etc.
I need to create additional 5th column D that contains the following:
For each row:
If 2 out of 3 values in any of the columns A, B or C = X then value in column D = X
If 2 out of 3 values in any of the columns A, B or C = Y then value in column D = Y
for anything else value in column D = Z
Desired output:
Date A B C D
1/1/16 X X Y X
1/2/16 X Y Y Y
1/3/16 X Y Z Z
I have tried the following logic recommended earlier for comparing values but it gives me a bunch of errors, apparently "=" sign is a no-no:
df$D <- NA
df$D[df$A = df$B = X or df$B = df$C = X or df$A = df$C = Y] <- "X"
df$D[df$A = df$B = Y or df$B = df$C = Y or df$A = df$C = Y] <- "Y"
df$D[is.na(df$E)] <- "Z"
Error: unexpected '=' in "df$D[df$A ="
What is the best way to do this?
Upvotes: 0
Views: 99
Reputation: 4554
Try to use ifelse:
df$D<-ifelse(rowSums(df[,2:4]=='X')>=2,'X',ifelse(rowSums(df[,2:4]=='Y')>=2,'Y','Z'))
Upvotes: 0
Reputation: 38500
Here's a method that uses rowMeans
.
# assign all of D to "Z"
df$D <- "Z"
# replace values
df$D[rowMeans(df[, 2:(length(df)-1)] == "X") > 0.5] <- "X"
df$D[rowMeans(df[, 2:(length(df)-1)] == "Y") > 0.5] <- "Y"
A nice feature is that it would grow if you add on more variables.
rowMeans(df[, 2:(length(df)-1)] == "X")
will calculate, for each row, the proportion of elements that have an "X". Your threshold (2 out of 3) is 0.666666... I used 0.5 instead as it is sufficient for the given number of columns. If additional columns were added, you might increase this nearer to 2/3rds.
Upvotes: 1
Reputation: 411
as Imo pointed out =
is for assignment ==
is for comparison. Or is |
. Here's a link to the logical operators.
There might be a nicer solution but here is the correct syntax for what you wanted to do:
df$D <- NA
df$D[(df$A == "X" & df$B == "X") | (df$A == "X" & df$C == "X") |
(df$B == "X" & df$C == "X")] <- "X"
df$D[(df$A == "Y" & df$B == "Y") | (df$A == "Y" & df$C == "Y") |
(df$B == "Y" & df$C == "Y")] <- "Y"
df$D[is.na(df$D)] <- "Z"
Upvotes: 0