Reputation: 153
I am trying to use grep to filter my data but also include NAs in the results, which are currently being dropped because they do not match the grep expression.
platform x86_64-w64-mingw32
version.string R version 3.5.3 (2019-03-11)
value expected_result actual_result
1 10001 Pass Pass
2 0 Pass Pass
3 6 Pass Pass
4 20004 Pass Pass
5 NA Pass Fail
6 4829 Fail Fail
7 521 Fail Fail
8 89 Fail Fail
9 40012 Fail Fail
10 47321 Fail Fail
df <- df[grep("(\\b\\d{1}\\b)|([0-9]{1}[0]{3}[0-9]{1})", df$value),]
1) The value will contain between 0 and 5 numeric characters.
2) The three values that should be retained are:
a) A single digit of data. (Example values 2 & 3)
b) No data or NA (Example values 5)
c) Five digits of data but the middle three digits must all be zeros. (Example values 1 & 4)
Upvotes: 3
Views: 1528
Reputation: 887223
To include the NA
rows, create a second condition with is.na
and join them with |
(OR) on the grepl
df[grepl("(\\b\\d{1}\\b)|([0-9]{1}[0]{3}[0-9]{1})", df$value)|is.na(df$value),]
# value expected_result actual_result
#1 10001 Pass Pass
#2 0 Pass Pass
#3 6 Pass Pass
#4 20004 Pass Pass
#5 NA Pass Fail
Or make it a bit more compact
grepl("^\\d$|^([1-9]0{3}[1-9]$)", df$value)|is.na(df$value)
df <- structure(list(value = c(10001L, 0L, 6L, 20004L, NA, 4829L, 521L,
89L, 40012L, 47321L), expected_result = c("Pass", "Pass", "Pass",
"Pass", "Pass", "Fail", "Fail", "Fail", "Fail", "Fail"), actual_result = c("Pass",
"Pass", "Pass", "Pass", "Fail", "Fail", "Fail", "Fail", "Fail",
"Fail")), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))
Upvotes: 4