Vaune
Vaune

Reputation: 47

Complex string parsing in Javascript

I am attempting to parse a complex string in JavaScript, and I'm pretty horrible with Regular Expressions, so I haven't had much luck. The data is loaded into a variable formatted as follows:

Miami 2.5 O (207.5) 125.0 | Oklahoma City -2.5 U (207.5) -145.0 (Feb 20, 2014 08:05 PM)

I am trying to parse that string following these parameters:

1) Each value must be loaded into their own variable (IE: separate variables for Miami, 2.5 O, (207.5) ect)
2) String must split at pipe character (I have this working with .split(" | ") )
3) I am dealing with city names that include spaces
4) The date at the end must be isolated and removed

I have a feeling regular expressions must be used, but I'm seriously hoping there is a different way to approach this. The example provided is just that, an example from a much larger data set. I can provide the full data set if requested.

More direct version of my question: Given the data above, what concepts / procedures can I use to intelligently parse the string elements into their own variables?

If RegEx must be used, will I need multiple expressions?

Thanks in advance for your help!

EDIT: In an effort to supply multiple pathways to a solution I'll explain the overarching problem as well. This data is the return of a RSS / XML item. The string mentioned above is sports odds, and is all contained in the title node of the feed I'm using. If anyone has a better XML / RSS feed for sports odds, I would be ecstatic for that as well.

EDIT 2: Thanks to the replies, I can run a RegEx that matches the data points needed. I'm now having trouble iterating through the matches and returning them correctly. I have the RegEx loaded into its own function:

function regExExtract (txt){
    var exp = /([^|\d]+) ([-\d.]+ [A-Z]) (\([^)]+\)) ([-\d.]+) (\([^)]+\))?/g;
    var comp_arr = exp.exec(txt);

    return comp_arr;        
}

And it is being called with:

var title_arr = regExExtract(title);  

Title is loaded with the data string listed above. I assume I'm using the global flag correctly to ensure all matches are considered, but I'm not sure I'm loading the matches correctly. I apologize for my ignorance, this is all brand new to me.

As requested below, my expected output is ultimately a table with a row for each city, and its subsequent data. Each cell in each row corresponds to a data point.

I have created a JS Fiddle with what I've done, and what the expected output is: http://jsfiddle.net/vDkQD/2/

Potential Final Edit: With the assistance of Robin and rewt, I have come up with:
http://jsfiddle.net/hMJx3/

Upvotes: 1

Views: 636

Answers (2)

Robin
Robin

Reputation: 9644

Wouldn't a regex like

/([^|\d]+) ([-\d.]+ [A-Z]) (\([^)]+\)) ([-\d.]+) (\([^)]+\))?/g

do the trick? Obviously, this is based on the example string you gave, and if there are other patterns possible this should be updated... But if it is that fixed it's not so complicated.

Afterwards you just have to go through the captured groups for each match, and you'll have your data parsed. Live demo for fun: http://regex101.com/r/kF5zD3

Explanation

  • [^|\d] evrything but a pipe or a digit. This is to account for strange city name that [a-zA-Z ] might not catch
  • [-\d.] a digit, a dot or a hyphen
  • \([^)]+\) opening parenthesis, everything that isn't a closing parenthesis, closing parenthesis.

Quick incomplete pointers on regex

  • Here, the regex is the part between the /. The g after is a flag, thanks to it the regex won't stop after hitting the first match and will return every match
  • The match is what the whole expression will find. Here, the match will be everything between two | in your string. The capturing groups are a very useful tool that allows you too extract data from this match: they are delimited by parenthesis, which are a special character in regex. (a)b will match ab, the first captured group of this match will be a
  • [...] is means every character inside will do. [abc] will match a or b or c.
  • + is a quantifier, another special character, meaning "one or more of what precedes me". a+ means "one or more a and will match aaaaa.
  • \d is a shortcut for [0-9] (yes, - is a special range character inside of [...]. That's why in [-\d.], which is equivalent to [-0-9.], it's directly following the opening bracket)
  • since parenthesis are special characters, when you actually want to match a parenthesis you need to escape: regex (\(a\))b will match (a)b, the first captured group of this match will be (a) with the parenthesis
  • ? means what precedes is optional (zero or one instances)
  • ^ when put at the beginning of a [...] statement means "everything but what's in the brackets". [^a]+ will match bcd-*ù but not aa

If you really know nothing about regex, as I believe they're the right tool for your case, I suggest your take a quick overview of a tuto, just to get a better idea of what you're dealing with. The way to set flags, loop through matches and their respective captured groups will depend on your language and how you call your regex.

Upvotes: 2

TimWolla
TimWolla

Reputation: 32701

[A-z][a-z]+( [A-z][a-z]+)* -?[0-9]+\.[0-9] [OU] \(-?[0-9]+\.[0-9]\) -?[0-9]+\.[0-9]

This should match a single part of your long string under the following assumptions:

  • The city consists only of alpha characters, each word starts with an uppercase character and is at least 2 characters long.
  • Numbers have an optional sign and exactly one digit after the decimal point
  • the single character is either O or U

Now it is up to you to:

  • Properly create capturing parentheses
  • Check whether my assumptions are right

In order to match the date:

\([JFMASOND][a-z]{2} [0-9]?[0-9], [0-9]{4} [0-9]{2}:[0-9]{2} [AP]M\)$

Upvotes: 1

Related Questions