Reputation: 493
Suppose I have the following df:
library(dplyr)
library(stringr)
input <- data.frame(
Id = c(1:6),
text = c("(714.4) (714) (714*)", "(714.33)", "(189) (1938.23)", "(714.93+) (714*)", "(719)", "(718.4)"))
And I would like to obtain the following output:
Output <- data.frame(
Id = c(1:6),
text = c("(714.4) (714) (714*)", "(714.33)", "(189) (1938.23)",
"(714.93+) (714*)", "(719) (299)", "(718.4)"),
first_match = c(1,0,0,0,1,0),
second_match = c(1,1,0,1,1,0))
This is, for the first column I want a one if (714)|(719)|(718) appear. For the second column I want a one if (714.33)|(714*)|(719) appear
In cases in which I want to evaluate if a pattern is in a string I use str_detect function from stringr package. However, in this case, with symbols such as [. + *] I am not obtaining the expected output.
I have tried the following code, which obviously failed:
attempt_1 <- input %>%
mutate(first_match = ifelse(str_detect(text, "(714)|(719)|(718)"), 1, 0),
second_match = ifelse(str_detect(text, "(714\\.33)|(714\\*)|(719)"), 1, 0))
attempt_2 <- input %>%
mutate(first_match = ifelse(str_detect(text, fixed("(714)|(719)")), 1, 0),
second_match = ifelse(str_detect(text, "(714\\.33)|(714\\*)"), 1, 0))
I tried to escape special symbols and also tried with exact match with the fixed parameter (I suppose it fails cause the | is not interpreted as an OR)
Any ideas?
Upvotes: 2
Views: 1696
Reputation: 886968
We can escape the (
library(dplyr)
library(stringr)
input %>%
mutate(first_match = +(str_detect(text, "\\(714\\)|\\(719\\)")),
second_match = +(str_detect(text, "\\(714\\.33\\)|\\(714\\*\\)|\\(719\\)")))
# Id text first_match second_match
#1 1 (714.4) (714) (714*) 1 1
#2 2 (714.33) 0 1
#3 3 (189) (1938.23) 0 0
#4 4 (714.93+) (714*) 0 1
#5 5 (719) 1 1
#6 6 (718.4) 0 0
Comparing with OP's expected output
Output
# Id text first_match second_match
#1 1 (714.4) (714) (714*) 1 1
#2 2 (714.33) 0 1
#3 3 (189) (1938.23) 0 0
#4 4 (714.93+) (714*) 0 1
#5 5 (719) (299) 1 1
#6 6 (718.4) 0 0
In the OP's code, the first one didn't work because the (
is a metacharacter, and in the second attempt, the |
is considered as fixed
Upvotes: 3