Reputation: 2998
I have the following code with a regex
CHARACTER <- ^([A-Z0-9 .])+(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$
str_match("WILL (V.O.)",CHARACTER)[1,2]
I thought this should match the value of "WILL " but it is returning blank. I tried the RegEx in a different language and the group is coming back blank in that instance also.
What do I have to add to this regex to pull back just the value "WILL"?
Upvotes: 1
Views: 47
Reputation: 627077
You formed a repeated capturing group by placing +
outside a group. Put it back:
CHARACTER <- "^([A-Z0-9 .]+)(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$"
^
Note you may trim Will
if you use a lazy match with \s*
after the group:
CHARACTER <- "^([A-Z0-9\\s.]+?)\\s*(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$"
See this regex demo.
> library(stringr)
> CHARACTER <- "^([A-Z0-9\\s.]+?)\\s*(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$"
> str_match("WILL (V.O.)",CHARACTER)[1,2]
[1] "WILL"
Alternatively, you may just extract Will
with
> str_extract("WILL (V.O.)", "^.*?(?=\\s*(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$)")
[1] "WILL"
Or the same with base R:
> regmatches(x, regexpr("^.*?(?=\\s*(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$)", x, perl=TRUE))
[1] "WILL"
Here,
^
- matches the start of a string.*?
- any 0+ chars other than line break chars as few as possible(?=\\s*(?:\\(V\\.O\\.\\))?(?:\\(O\\.S\\.\\))?(?:\\(CONT'D\\))?$)
- makes sure that, immediately to the right of the current location, there is
\\s*
- 0+ whitespaces(?:\\(V\\.O\\.\\))?
- an optional (V.O.)
substring(?:\\(O\\.S\\.\\))?
- an optional (O.S.)
substring(?:\\(CONT'D\\))?
- an optional (CONT'D)
substring$
- end of string.Upvotes: 1