Reputation: 973
This problem might not be a specific programming issue but, I try to find chemical formulas like H20, C02 etc. in a scientic text and I use this:
(?<=[\l\u]|\.)\d+
This works - but now also every floating point number after the 'dot' is found:
0.1234 -> 1234 is selected.
Is there a chance to prevent this? Thanks in advance!
Upvotes: 2
Views: 1598
Reputation: 33918
If you want to also match strings like H2O
, CH3CH2CH2CH3
, SiO2
you could use:
(?i)\b[a-z]+(?:\d+[a-z]+)*\b
or
\b(?:[A-Z][a-z]?)+(?:\d+(?:[A-Z][a-z]?)+)*\b
Upvotes: 1
Reputation: 665090
You might also include a negative lookbehind to prevent a preceding dot with a digit before it:
(?<=[\l\u.])(?<!\d\.)\d+
Upvotes: 1