Reputation: 3422
I have the following string:
"Josua de Grave* (1643-1712)"
Everything before the *
is the person's name, the first date 1634
is his birth date, 1712
is the date of his death.
Following this logic I'd like to have 3 match groups for each one of the item. I tried
([a-zA-Z|\s]*)\* (\d{3,4})-(\d{3,4})
"Josua de Grave* (1643-1712)".match(/([a-zA-Z|\s]*)\* (\d{3,4})-(\d{3,4})/)
but that returns nil.
Why is my logic wrong, and what should I do to get the 3 intended match groups.
Upvotes: 2
Views: 214
Reputation: 37404
The additional brackets (
)
around the digit 1643-1712
values needs to be added in your regex pattern so use
([a-zA-Z\s]*)\* \((\d{3,4})-(\d{3,4})\)
// ^^ ^^
since brackets represents the captured group so escape them using \
to match them as a character.
Upvotes: 1
Reputation: 110675
r = /\*\s+\(|(?<=\d)\s*-\s*|\)/
"Josua de Grave* (1643-1712)".split r
#=> ["Josua de Grave", "1643", "1712"]
"Sir Winston Leonard Spencer-Churchill* (1874 - 1965)".split r
#=> ["Sir Winston Leonard Spencer-Churchill", "1874", "1965"]
The regular expression can be made self-documenting by writing it in free-spacing mode:
r = /
\*\s+\( # match '*' then >= 1 whitespaces then '('
| # or
(?<=\d) # match is preceded by a digit (positive lookbehind)
\s*-\s* # match >= 0 whitespaces then '-' then >= 0 whitespaces
| # or
\) # match ')'
/x # free-spacing regex definition mode
The positive lookbehind is needed to avoid splitting hyphenated names on hyphens. (The positive lookahead (?=\d)
, placed after \s*-\s*
, could be used instead.)
Upvotes: 1
Reputation: 160551
While you can use a pattern, the problem of splitting this into its parts can also be easily done using other Ruby methods:
Using split
:
s = "Josua de Grave* (1643-1712)"
name, dates = s.split('*') # => ["Josua de Grave", " (1643-1712)"]
birth, death = dates[2..-2].split('-') # => ["1643", "1712"]
Or, using scan
:
*name, birth, death = s.scan(/[[:alnum:]]+/) # => ["Josua", "de", "Grave", "1643", "1712"]
name.join(' ') # => "Josua de Grave"
birth # => "1643"
death # => "1712"
If I was using a pattern, I'd use this:
name, birth, death = /^([^*]+).+?(\d+)-(\d+)/.match(s)[1..3] # => ["Josua de Grave", "1643", "1712"]
name # => "Josua de Grave"
birth # => "1643"
death # => "1712"
/(^[^*]+).+?(\d+)-(\d+)/
means:
^
start at the beginning of the buffer([^*]+)
capture everything not *
, where it'll stop capturing.+?
skip the minimum until...(\d+)
the year is matched and captured-
match but don't capture (\d+)
the year is matched and capturedRegexper helps explain it as does Rubular.
Upvotes: 1