Determine if sub string appears in a string by row of dataframe

Question

I have a dataframe that is revised every day. When an error occurs, It's checked, and if it can be solved, then the keyword "REVISED" is added to the beginning of the error message. Like so:

ID  M1               M2                M3        
1   NA               "REVISED-error"   "error"    
2   "REVISED-error"  "REVISED-error"   NA        
3   "REVISED-error"  "REVISED-error"   "error"   
4   NA               "error"           NA         
5   NA               NA                NA

I want to find a way to add two columns, helping me determine if there are any error, and how many of them have been revised. Like this:

ID  M1               M2                M3         i1   ix
1   NA               "REVISED-error"   "error"    2    1    <- 2 errors, 1 revised
2   "REVISED-error"  "REVISED-error"   NA         2    2
3   "REVISED-error"  "REVISED-error"   "error"    3    2
4   NA               "error"           NA         1    0
5   NA               NA                NA         0    0

I found this code:

df <- df%>%mutate(i1 = rowSums(!is.na(.[2:4])))

That helps me to know how many errors are in those specific columns. How can I know if any of said errors contains the keyword REVISED? I've tried a few things but none have worked so far:

df <- df%>% mutate(i1 = rowSums(!is.na(.[2:4])))%>% mutate(ie = rowSums(.[2:4) %in% "REVISED")

This returns an error x must be an array of at least two dimensions

Ronak Shah · Accepted Answer

You could use apply to find number of times "error" and "REVISED" appears in each row.

df[c("i1", "ix")] <- t(apply(df[-1], 1, function(x) 
                  c(sum(grepl("error", x)), sum(grepl("REVISED", x)))))


df
#  ID            M1            M2    M3 i1 ix
#1  1           REVISED-error error  2  1
#2  2 REVISED-error REVISED-error    2  2
#3  3 REVISED-error REVISED-error error  3  2
#4  4                   error    1  0
#5  5                        0  0

Althernative approach using is.na and rowSums to calculate i1.

df$i1 <- rowSums(!is.na(df[-1]))
df$ix <- apply(df[-1], 1, function(x) sum(grepl("REVISED", x)))

data

df <- structure(list(ID = 1:5, M1 = structure(c(NA, 1L, 1L, NA, NA), 
.Label = "REVISED-error", class = "factor"), 
M2 = structure(c(2L, 2L, 2L, 1L, NA), .Label = c("error", 
"REVISED-error"), class = "factor"), M3 = structure(c(1L, 
NA, 1L, NA, NA), .Label = "error", class = "factor")), row.names = c(NA, 
-5L), class = "data.frame")

Determine if sub string appears in a string by row of dataframe

Answers (2)

Related Questions