Ghoul Fool
Ghoul Fool

Reputation: 6967

How do I split a string consisting of multiple parts of using regular expressions in JavaScript?

I'm having problems with regular expressions in JavaScript. I've got a number of strings that need delimiting by commas. Unfortunately, the sub strings don't have quotes around them which would make life easier.

var str1 = "Three Blind Mice 13 Agents of Cheese Super 18"
var str2 = "An Old Woman Who Lived in a Shoe 7 Pixies None 12"
var str3 = "The Cow Jumped Over The Moon 21 Crazy Cow Tales Wonderful 9"

They are in the form of PHRASE1 (Mixed type with spaces") INTEGER1 (1 or two digit) PHRASE2 (Mixed type with spaces") WORD1 (single word mixed type, no spaces) INTEGER2 (1 or two digit)

so I should get:

result1 = "Three Blind Mice,13,Agents of Cheese,Super,18"
result2 = "An Old Woman Who Lived in a Shoe,7,Pixies,None,12"
result3 = "A Cow Jumped Over The Moon,21, Crazy Cow Tales,Wonderful,9"

I've looked at txt2re.com, but can't quite get what I need and ended up delimiting by hand. But I'm sure it can be done, albeit someone with a bigger brain. There are lots of examples of regEx but I couldn't find any to deal with phrases; so I was wondering if anyone could help me out. Thank you.

Upvotes: 3

Views: 1345

Answers (2)

PatrikAkerstrand
PatrikAkerstrand

Reputation: 45731

Here's an attempt at a regular expression that works for your example strings:

/^\b((?:[a-z]+ ?)+)\b (\d{1,2}) \b((?:[a-z]+ ?)+)\b (\b[a-z]+\b) (\d{1,2})$/i

Basically, it consists of five different parts, each designed to match your descriptions:

  1. \b((?:[a-z]+ ?)+)\b = Matches words consisting of a-z, optionally followed by a space. We then quantify this as many times as possible (the i-flag makes the search case insensitive)
  2. (\d{1,2}) = Matches 1 or two digits. Could also be written as [0-9].
  3. \b((?:[a-z]+ ?)+)\b = Same as nr 1.
  4. (\b[a-z]+\b) = Matches a single word consisting of a-z
  5. (\d{1,2}) = Same as nr 2.

Things to note:

  1. I have anchored the match to ensure it only considers the full string using ^ (start of subject) and $ (end of subject).
  2. After the regex delimiters /.../, I have added a flag that alters how the regex engine behaves. The i-flag makes the match Case-Insensitive.
  3. A caveat is that contractions like "it's" will NOT be matched by the current regex. You will need to modify regex group 1 and 3 to accommodate this.
  4. The groups are separated by a single space. IF this can vary, then you need to modify the group separators.

You can use it as follows:

var regex = /^\b((?:[a-z]+ ?)+)\b (\d{1,2}) \b((?:[a-z]+ ?)+)\b (\b[a-z]+\b) (\d{1,2})$/i;
var s = "The Cow Jumped Over The Moon 21 Crazy Cow Tales Wonderful 9";
s = s.replace(regex, '$1, $2, $3, $4, $5');

JS-fiddle demo here

Edit: I've updated the demo to create a variable named resultCollection to hold the processed results. It's an object consisting of each original string as key, and the processed resulting string as the value.

Upvotes: 3

João Silva
João Silva

Reputation: 91369

Try the following:

var s = "Three Blind Mice 13 Agents of Cheese Super 18";
s.replace(/([^\d]+) (\d{1,2}) ([^\d]+) ([A-Z][a-z]+) (\d{1,2})/, '$1, $2, $3, $4, $5')
// "Three Blind Mice, 13, Agents of Cheese, Super, 18"

DEMO.

Upvotes: 3

Related Questions