Reputation: 197
I’m struggling to get a bit of regular expressions code to work. I have a long list of strings that I need to partially extract. I need only strings that starting with “WER” and I only need the last part of the string commencing (including) on the letter.
test <- c("abc00012Z345678","WER0004H987654","WER12400G789456","WERF12","0-0Y123")
Here is the line of code which is working but only for one letter. However in my list of strings it can have any letter.
ifelse(substr(test,1,3)=="WER",gsub("^.*H.*?","H",test),"")
What I’m hoping to achieve is the following:
H987654
G789456
F12
Upvotes: 2
Views: 1318
Reputation: 626835
You can use the following pattern with gsub
:
> gsub("^(?:WER.*([a-zA-Z]\\d*)|.*)$", "\\1", test)
[1] "" "H987654" "G789456" "F12" ""
See the regex demo
This pattern matches:
^
- start of a string(?:
- start of an alternation group with 2 alternatives:
WER.*([a-zA-Z]\\d*)
- WER
char sequence followed with 0+ any characters (.*
) as many as possible up to the last letter ([a-zA-Z]
) followed by 0+ digits (\\d*
) (replace with \\d+
to match 1+ digits, to require at least 1 digit)|
- or)$
- closing the alternation group and match the end of string with $
.With str_match
from stringr, it is even tidier:
> library(stringr)
> res <- str_match(test, "^WER.*([a-zA-Z]\\d*)$")
> res[,2]
[1] NA "H987654" "G789456" "F12" NA
>
If there are newlines in the input, add (?s)
at the beginning of the pattern: res <- str_match(test, "(?s)^WER.*([a-zA-Z]\\d*)$")
.
Upvotes: 5
Reputation: 70266
If you don't want empty strings or NA for strings that don't start with "WER", you could try the following approach:
sub(".*([A-Z].*)$", "\\1", test[grepl("^WER", test)])
#[1] "H987654" "G789456" "F12"
Upvotes: 3