Reputation: 2086
I have a test list that I am trying to capture data from using a regex.
Here is a sample of the text format:
(1) this is a sample string /(2) something strange /(3) another bit of text /(4) the last one/ something!/
I have a Regex that currently captures this correctly, but I am having some difficulty with making it work under outlier conditions.
Here is my regex
/\(?\d\d?\)([^\)]+)(\/|\z)/
Unfortunately some of the data contains parentheses like this:
(1) this is a sample string (1998-1999) /(2) something strange (blah) /(3) another bit of text /(4) the last one/ something!/
The substrings '(1998-1999)' and '(blah)' make it fail!
Anyone care to have a crack at this one? Thank you :D
Upvotes: 1
Views: 2192
Reputation: 625087
I would try this:
\((\d+)\)\s+(.*?)(?=/(?:\(\d+\)|\z))
This rather scary looking regex does the following:
[^/]+
) for this kind of problem;(?=...)
) says the expression must be followed by a backslash and then one of:
To give you an example in PHP (you don't specify your language):
$s = '(1) this is a sample string (1998-1999) /(2) something strange (blah) /(3) another bit of text /(4) the last one/ something!/';
preg_match_all('!\((\d+)\)\s+(.*?)(?=/(?:\(\d+\)|\z))!', $s, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => (1) this is a sample string (1998-1999)
[1] => (2) something strange (blah)
[2] => (3) another bit of text
[3] => (4) the last one/ something!
)
[1] => Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
)
[2] => Array
(
[0] => this is a sample string (1998-1999)
[1] => something strange (blah)
[2] => another bit of text
[3] => the last one/ something!
)
)
Some notes:
\d+
with \d\d?
.Upvotes: 1
Reputation: 523304
Prepend a /
to the beginning of string, append a (0)
to the end of the string, then split the whole string with the pattern \/\(\d+\)
, and discard the first and last empty elements.
Upvotes: 1