Reputation: 1579
Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).
I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":
/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g
Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?
Upvotes: 4
Views: 203
Reputation: 5395
I would try with:
/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g
at least with example above, it matches all units merged into one string. DEMO
EDIT
Another try (DEMO):
/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g
this one again match only one part, but if you use $1,$2,$3,$4, etc, (DEMO) you can extract other fragments. It ignores .
, (
, )
, characters. The problem is to count proper matched groups, but it works to some degree.
Or if you accept multiple separate matches I think simple alternative is:
/(m|k|Btu|th|ft)/g
Upvotes: 2
Reputation: 5815
I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m
or k
character and separated by parantheses or dots.
/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g
https://regex101.com/r/eQ5nR4/2
If you don't care about being able to match the parentheses but just return the elements you can just do
/(m|k?)(\w+)/g
https://regex101.com/r/oC1eP5/1
Upvotes: 0
Reputation: 12017
A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:
[mk]??(Btu\(th\)|ft|m)(?!\w)
Upvotes: 0