Reputation: 2230
I want to split this text into all the different characters in the story, but you can see there are notes and other data stored in brackets and parenthesis:
var string = "Batman [Bruce Wayne; also as Two-Face]; Joker; Ra's al Ghul; Mr. Freeze; Killer Moth; Poison Ivy; Mad Hatter; Spook; Scarecrow; Captain Stingaree; Cavalier; Cluemaster; Signalman; Batman [Jerry Randall]; Tweedle Dum; Tweedle Dee; Catwoman; Riddler; Lex Luthor; Superman; Two-Face; Commissioner Jim Gordon; Arkham Asylum";
In general you can split this string like so:
string.split(';')
And you'll get pretty close, but there are cases where there is a semicolon in between the brackets or parenthesis. So "Batman" in this case get broken into two characters.
QUESTION: How can I remove the semicolons inside the brackets and parenthesis before splitting?
I tried a regex like this:
characters.replace('/(\[[^)]*);([^)]*\])/', '$1$2')
But doesn't seem to work. Any ideas?
Upvotes: 3
Views: 4534
Reputation: 7948
this pattern should do it /;(?=((?!\[).)*?\])/g
demo
to match semicolons outside brackets /;(?=(((?!\]).)*\[)|[^\[\]]*$)/g
Demo
Upvotes: 2
Reputation: 89629
You don't need to remove ;
inside square brackets before the split if you use this code:
result = string.split(/\s*;\s*(?![^[]*])/);
(I added \s*
to trim leading and trailing spaces)
Upvotes: 0
Reputation: 38888
Here you are:
/(\[.*?);(.*?\])/g
Example:
var string = "Batman [Bruce Wayne; also as Two-Face]; Joker; Ra's al Ghul; Mr. Freeze; Killer Moth; Poison Ivy; Mad Hatter; Spook; Scarecrow; Captain Stingaree; Cavalier; Cluemaster; Signalman; Batman [Jerry Randall]; Tweedle Dum; Tweedle Dee; Catwoman; Riddler; Lex Luthor; Superman; Two-Face; Commissioner Jim Gordon; Arkham Asylum";
string.replace(/(\[.*?);(.*?\])/g, '$1$2')
"Batman [Bruce Wayne also as Two-Face]; Joker; Ra's al Ghul; Mr. Freeze; Killer Moth; Poison Ivy; Mad Hatter; Spook; Scarecrow; Captain Stingaree; Cavalier; Cluemaster; Signalman; Batman [Jerry Randall]; Tweedle Dum; Tweedle Dee; Catwoman; Riddler; Lex Luthor; Superman; Two-Face; Commissioner Jim Gordon; Arkham Asylum"
Upvotes: 1
Reputation: 349
You can remove them or just protect the ones inside the brackets the same way...
Replace the semi colons you want to protect with an arbitrary string that won't occur naturally:
string.replace(/([\[\(][^\[\(\]\)]+);([^\[\(\]\)]+[\]\)])/g,'$1~~$2')
Replace the remaining semi-colons with a different arbitrary string that won't occur naturally (and clean up those spaces):
.replace(/; */g,"^^")
Switch the protected strings back to semi-colons:
.replace(/~~/g,";")
Split what you have left:
.split("^^");
...that should give you your desired result.
Upvotes: 0