Reputation: 97
I have a dataframe that is revised every day. When an error occurs, It's checked, and if it can be solved, then the keyword "REVISED" is added to the beginning of the error message. Like so:
ID M1 M2 M3
1 NA "REVISED-error" "error"
2 "REVISED-error" "REVISED-error" NA
3 "REVISED-error" "REVISED-error" "error"
4 NA "error" NA
5 NA NA NA
I want to find a way to add two columns, helping me determine if there are any error, and how many of them have been revised. Like this:
ID M1 M2 M3 i1 ix
1 NA "REVISED-error" "error" 2 1 <- 2 errors, 1 revised
2 "REVISED-error" "REVISED-error" NA 2 2
3 "REVISED-error" "REVISED-error" "error" 3 2
4 NA "error" NA 1 0
5 NA NA NA 0 0
I found this code:
df <- df%>%mutate(i1 = rowSums(!is.na(.[2:4])))
That helps me to know how many errors are in those specific columns. How can I know if any of said errors contains the keyword REVISED? I've tried a few things but none have worked so far:
df <- df%>%
mutate(i1 = rowSums(!is.na(.[2:4])))%>%
mutate(ie = rowSums(.[2:4) %in% "REVISED")
This returns an error x must be an array of at least two dimensions
Upvotes: 1
Views: 50
Reputation: 388907
You could use apply
to find number of times "error"
and "REVISED"
appears in each row.
df[c("i1", "ix")] <- t(apply(df[-1], 1, function(x)
c(sum(grepl("error", x)), sum(grepl("REVISED", x)))))
df
# ID M1 M2 M3 i1 ix
#1 1 <NA> REVISED-error error 2 1
#2 2 REVISED-error REVISED-error <NA> 2 2
#3 3 REVISED-error REVISED-error error 3 2
#4 4 <NA> error <NA> 1 0
#5 5 <NA> <NA> <NA> 0 0
Althernative approach using is.na
and rowSums
to calculate i1
.
df$i1 <- rowSums(!is.na(df[-1]))
df$ix <- apply(df[-1], 1, function(x) sum(grepl("REVISED", x)))
data
df <- structure(list(ID = 1:5, M1 = structure(c(NA, 1L, 1L, NA, NA),
.Label = "REVISED-error", class = "factor"),
M2 = structure(c(2L, 2L, 2L, 1L, NA), .Label = c("error",
"REVISED-error"), class = "factor"), M3 = structure(c(1L,
NA, 1L, NA, NA), .Label = "error", class = "factor")), row.names = c(NA,
-5L), class = "data.frame")
Upvotes: 2
Reputation: 52268
You can use str_count()
from the stringr
library to count the number of times REVISED
appears, like so
df <- data.frame(M1=as.character(c(NA, "REVISED-x", "REVISED-x")),
M2=as.character(c("REVISED-x", "REVISED-x", "REVISED-x")),
stringsAsFactors = FALSE)
library(stringr)
df$ix <- str_count(paste0(df$M1, df$M2), "REVISED")
df
# M1 M2 ix
# 1 <NA> REVISED-x 1
# 2 REVISED-x REVISED-x 2
# 3 REVISED-x REVISED-x 2
Upvotes: 1