Reputation: 27
I want to select every lines in which we can find the expression "X01" or "X02" :
dataEx <- data.frame(code = c("X01-X043","X034","X024","X015-X036-X033","X012","X015-X042","X019","X036","X022-X043"),res = NA )
pat1 <- c("(^|-)X01($|-|.)","(^|-)X02($|-|.)")
dataEx$res[grep(paste(pat1,collapse="|"),dataEx$code)] <- "ok"
It works correctly and gives me the result :
code res
1 X01-X043 ok
2 X034 <NA>
3 X024 ok
4 X015-X036-X033 ok
5 X012 ok
6 X015-X042 ok
7 X019 ok
8 X036 <NA>
9 X022-X043 ok
But I would like to know which pattern is found :
code res
1 X01-X043 X01
2 X034 <NA>
3 X024 X024
4 X015-X036-X033 X015
5 X012 X012
6 X015-X042 X015
7 X019 X019
8 X036 <NA>
9 X022-X043 X022
I am very new to regular expression. Is there an easy way to do it ? (In reality, "pat1" is much longer, I am looking for 20 different patterns)
Upvotes: 1
Views: 66
Reputation: 21400
You can use str_extract
in this way:
library(stringr)
dataEx$res <- str_extract(dataEx$code, "X0(1|2)\\d?")
Here, we are looking to match literal X0
followed by either 1
OR 2
followed by another optional d
igit.
Result:
dataEx
code res
1 X01-X043 X01
2 X034 <NA>
3 X024 X024
4 X015-X036-X033 X015
5 X012 X012
6 X015-X042 X015
7 X019 X019
8 X036 <NA>
9 X022-X043 X022
Upvotes: 1
Reputation: 79208
You could do:
a <- regmatches(dataEx$code, gregexpr(paste(pat1, collapse = "|"), dataEx$code))
is.na(a)<-lengths(a)==0
dataEx$res <- unlist(a)
The question though is what if there is more than one match on one row?
Upvotes: 0
Reputation: 371
Are you open to using the stringr package? I agree with Jaskeil, I tend to prefer data.table over data.frame but that is primarily for execution speed. Not sure if that will be a concern for your application.
library(stringr)
dataEx <- data.frame(code = c("X01-X043","X034","X024","X015-X036-X033","X012","X015-X042","X019","X036","X022-X043"),res = NA )
dataEx$res <- str_extract(dataEx$code, "((^|-)X01($|-|.))|((^|-)X02($|-|.))")
Upvotes: 0