Reputation: 3327
I want to loop through a data frame and create a new column that says 'YES' if the 2nd to 4th elements in the row are 'ANOMALY' and 'NO' otherwise.
for (j in 1:nrow(residual_anomalies)){
if (all(residual_anomalies[j,2:4]=='ANOMALY')) {residual_anomalies$Prediction_Anomaly[j] <- 'YES'} else
residual_anomalies$Prediction_Anomaly[j] <- 'NO'
}
So the above is currently what I'm using. It works but it's taking a big computational performance hit so I'm trying to vectorize it. What I had done so far was create a function that returns 'YES' or 'NO' based on if the elements of the row were all 'ANOMALY'.
vote_for_anomaly <- function(x){
if (all(x)=='ANOMALY') return('YES') else
return('NO')}
And then I try to use the apply function in R
aggregates <- apply(residual_anomalies[,2:4],1,vote_for_anomaly)
but then I'm getting the following errors/warnings
Error in if (all(x) == "ANOMALY") return("ANOMALY") else return("NO SIGNAL") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In all(x) : coercing argument of type 'character' to logical
Can someone tell me why this isn't working and how I should change this?
You can use this data for testing and call it residual_anomalies
1 ANOMALY ANOMALY ANOMALY ANOMALY
2 ANOMALY NO SIGNAL ANOMALY ANOMALY
3 ANOMALY ANOMALY ANOMALY ANOMALY
4 NO SIGNAL ANOMALY NO SIGNAL ANOMALY
5 ANOMALY ANOMALY ANOMALY ANOMALY
6 NO SIGNAL NO SIGNAL ANOMALY ANOMALY
Upvotes: 1
Views: 54
Reputation: 174778
It might be quicker to do this using indexing, rather than ifelse()
. First set up a vector of No
of required length:
aggregates <- rep("No", NROW(residual_anomalies))
Then just index this vector where all residual_anomalies[, 2:4] == "ANOMALY"
aggregates[rowSums(residual_anomalies[, 2:4] == "ANOMALY") == 3L] <- "Yes"
This gives:
> aggregates
[1] "Yes" "No" "Yes" "No" "Yes" "No"
This part residual_anomalies[, 2:4] == "ANOMALY"
creates a logical matrix:
> residual_anomalies[, 2:4] == "ANOMALY"
V2 V3 V4
[1,] TRUE TRUE TRUE
[2,] FALSE TRUE TRUE
[3,] TRUE TRUE TRUE
[4,] TRUE FALSE TRUE
[5,] TRUE TRUE TRUE
[6,] FALSE TRUE TRUE
When we take the rowsums()
, TRUE
is converted to 1
and FALSE
to 0
. Hence only those rows where all elements are TRUE
will get selected and assigned "Yes"
.
Upvotes: 1
Reputation: 1716
As @lukeA said you have mixed up your parentheses, but here is a simpler over all solution as well:
aggregates <- ifelse(apply(residual_anomalies, 1,
function(x) all(x[2:4] == "ANOMALY")), "YES", "NO")
Upvotes: 0
Reputation: 12640
Per @lukeA, there's a typo in your code. It should be
all(x == "ANOMALY")
but it would be faster to do:
residual_anomalies$Prediction_Anomaly <-
ifelse(rowSums(residual_anomalies[, 2:4] == "ANOMALY") == 3, "YES", "NO")
rowSums is very fast.
Upvotes: 0