Reputation: 135
In my code I have a string variable (panel_name
) which can have a number of different forms along the lines of: CVD II
or Onc, IR
or CVD II, CVD III
and so-on. I also have a function which then searches this variable for specific strings, and based on their presence outputs other strings.
So, for example, I have:
if (grepl("CVD II", panel_name) == TRUE){
panel_pref = ""
panel = "CVD2"
} else if (grepl("CVD III", panel_name) == TRUE){
panel_pref = ""
panel = "CVD3"
}
The issue I am coming across however is in an example input of CVD II
, this will return as "TRUE" if panel_name == CVD III
and this is not what I want.
My current solution is to just invert the above code, so it becomes:
if (grepl("CVD III", panel_name) == TRUE){
panel_pref = ""
panel = "CVD3"
} else if (grepl("CVD II", panel_name) == TRUE){
panel_pref = ""
panel = "CVD2"
}
But this feels a little messy, so I am wondering if there is a way to search for a string specifically within another string.
I can't use if x == y (for example) because the variable sometimes contains more than one of the "names" I am searching for, but grepl
seems not to have allow exclusions.
Upvotes: 0
Views: 2377
Reputation: 16842
A couple regex options to use in your if
/ else
tests:
test_cases <- c("CVD II", "CVD III")
Is II
found at the end of the string?
grepl("CVD II$", test_cases)
#> [1] TRUE FALSE
Is II
found at the boundary of a word?
grepl("CVD II\\b", test_cases)
#> [1] TRUE FALSE
Is II
found without being followed by another I
? Requires perl syntax.
grepl("CVD II(?!I)", test_cases, perl = T)
#> [1] TRUE FALSE
Or you can skip the if else
tests and use a vectorized search and paste. The stringi
and stringr
packages have several convenience functions.
If you don't expect I
to show up otherwise, you can simply count occurrences of I
and paste that to CVD
.
paste0("CVD", stringi::stri_count_regex(test_cases, "I"))
#> [1] "CVD2" "CVD3"
Or, a somewhat strange option: Your strings contain roman numerals. Extract the I
strings that occur after CVD
:
stringi::stri_extract_first_regex(test_cases, "(?<=CVD )(I+)")
#> [1] "II" "III"
You could expand that for higher roman numerals by including something like ([IVX]+)
. Then convert them to roman numeral objects with utils::as.roman
, then regular numeric objects, then paste.
paste0("CVD",
as.numeric(as.roman(stringi::stri_extract_first_regex(test_cases, "(?<=CVD )(I+)"))))
#> [1] "CVD2" "CVD3"
Upvotes: 2
Reputation: 364
Sabor117,
You should check out ?regexp and expand your use of the regular expressions available to you there. For example, if it's just about distinguishing "CVD II", from "CVD III", then you can just indicate the end of the string with $ as below:
a <- "CVD III"
grepl(x=a,pattern="CVD II$")
Depending on your situation, there could be much better solutions.
Also, if you are new to regular expressions, it helps to be able to experiment with the wildcards and other regex syntax. I would point you too one of the regex resources out there. My personal favorite is https://regex101.com/
Upvotes: 1