MarksCode
MarksCode

Reputation: 8584

String split with regex gives unexpected result

I'm trying to split my string that represents html markup so that <ul> tags end up as a separate index in the resulting array. I've created the following regex which seems to work for finding <ul>...</ul>:

/(<ul>.*?<\/ul>)/i

I know it works because I tested it here: https://regex101.com/r/DNAHzr/2

However, as seen in the snipped below, the string split() doesn't seem to actually split my markdown on the given regex:

var body = "soupp\n\nWhat a bloody nice video!! :)) {{youtube:hyYnAioXOqQ}}\n\nSuppp\n\n<ul>\n<li>1\n</li>\n<li><b>2</b>\n</li>\n</ul>\n{{attachment:2938222}}\n\n<ul>\n<li>1\n</li>\n<li>2\n</li>\n</ul>\n<ol>\n<li>bruhh\n</li>\n<li>twotwo\n</li>\n</ol>"

var comps = body.split(/(<ul>.*?<\/ul>)/i).filter(x => !!x);

console.log(comps);

Can anybody help me get my method to work properly?

Upvotes: 0

Views: 50

Answers (1)

Mark
Mark

Reputation: 92440

If I understand your question right, you want something like this:

[ 'soupp\n\nWhat a bloody nice video!! :)) {{youtube:hyYnAioXOqQ}}\n\nSuppp\n\n',
'<ul>',
'\n<li>1\n</li>\n<li><b>2</b>\n</li>\n',
'</ul>',
'\n{{attachment:2938222}}\n\n',
'<ul>',
'\n<li>1\n</li>\n<li>2\n</li>\n',
'</ul>',
'\n<ol>\n<li>bruhh\n</li>\n<li>twotwo\n</li>\n</ol>' ]

Is that right?

If so, you should be able to simply use:

var comps = body.split(/(<.?ul>)/g);

EDIT: To include the text and the ul tags, you need to match newlines, which .* won't. You can do it with:

var comps = body.split(/(<ul>[\s\S]*?<\/ul>)/g);

Which should give you:

[ 'soupp\n\nWhat a bloody nice video!! :)) {{youtube:hyYnAioXOqQ}}\n\nSuppp\n\n',
'<ul>\n<li>1\n</li>\n<li><b>2</b>\n</li>\n</ul>',
'\n{{attachment:2938222}}\n\n',
'<ul>\n<li>1\n</li>\n<li>2\n</li>\n</ul>',
'\n<ol>\n<li>bruhh\n</li>\n<li>twotwo\n</li>\n</ol>' ]

Upvotes: 1

Related Questions