cubiclewar
cubiclewar

Reputation: 1579

Parsing units with javascript regex

Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).

I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":

/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g

Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?

Additional notes

Upvotes: 4

Views: 203

Answers (3)

m.cekiera
m.cekiera

Reputation: 5395

I would try with:

/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g

at least with example above, it matches all units merged into one string. DEMO

EDIT

Another try (DEMO):

/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g

this one again match only one part, but if you use $1,$2,$3,$4, etc, (DEMO) you can extract other fragments. It ignores ., (, ), characters. The problem is to count proper matched groups, but it works to some degree.

Or if you accept multiple separate matches I think simple alternative is:

/(m|k|Btu|th|ft)/g 

Upvotes: 2

Jan
Jan

Reputation: 5815

I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots.

/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g

https://regex101.com/r/eQ5nR4/2

If you don't care about being able to match the parentheses but just return the elements you can just do

/(m|k?)(\w+)/g

https://regex101.com/r/oC1eP5/1

Upvotes: 0

Anonymous
Anonymous

Reputation: 12017

A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:

[mk]??(Btu\(th\)|ft|m)(?!\w)

Demo

Upvotes: 0

Related Questions