Reputation: 41
I want to split a string based upon semicolon except the semicolons inside a square bracket!
string="'[Forsyth, Jennifer K.; Asarnow, Robert F.] Univ Calif Los Angeles, Dept Psychol, Los Angeles, CA 90095 USA; [Bachman, Peter] Univ Pittsburgh, Dept Psychiat, Pittsburgh, PA 15213 USA; [Mathalon, Daniel H.] Univ Calif San Francisco, Dept Psychiat, San Francisco, CA 94143 USA; [Mathalon, Daniel H.; Roach, Brian J.] San Francisco VA Med Ctr, San Francisco, CA 94121 USA; [Asarnow, Robert F.] Univ Calif Los Angeles, Dept Psychiat & Biobehav Sci, Los Angeles, CA 90095 USA'"
when I used
strung=filter(None, re.split("[;]", string))
the output was
["'[Forsyth, Jennifer K.",
' Asarnow, Robert F.] Univ Calif Los Angeles, Dept Psychol, Los Angeles, CA 90095 USA',
' [Bachman, Peter] Univ Pittsburgh, Dept Psychiat, Pittsburgh, PA 15213 USA',
This removed all the semicolon even within the square brackets. How do I maintain the square brackets and the semicolons within them and split on the base of all other semicolons.
Upvotes: 1
Views: 86
Reputation: 2687
Brackets have a different meaning in regular expressions - usually they are used to match a single character of a list of characters. Regardless, what you want is actually this:
\[;\]
This escapes the brackets in the regex.
Upvotes: 2
Reputation: 785256
You can use a negative lookahead based regex for splitting:
strung = filter(None, re.split(r';(?![^\[\]]*\])', string))
(?![^\[\]]*\])
is the negative lookahead to assert that ;
is not within [...]
.
Output"
'[Forsyth, Jennifer K.; Asarnow, Robert F.] Univ Calif Los Angeles, Dept Psychol, Los Angeles, CA 90095 USA
[Bachman, Peter] Univ Pittsburgh, Dept Psychiat, Pittsburgh, PA 15213 USA
[Mathalon, Daniel H.] Univ Calif San Francisco, Dept Psychiat, San Francisco, CA 94143 USA
[Mathalon, Daniel H.; Roach, Brian J.] San Francisco VA Med Ctr, San Francisco, CA 94121 USA
[Asarnow, Robert F.] Univ Calif Los Angeles, Dept Psychiat & Biobehav Sci, Los Angeles, CA 90095 USA'
Upvotes: 4