Reputation: 348
I would like to build Regexp pattern matcher which could match next possibilities:
11
2.5
ca. 111g
ca. 120 g Case
11 Kilograms
12.5-125.0 g
ca. 120% g
In this cases I should become always 4 groups (use "ca. 12.5-125.0% g" as example):
I have already build this regex, but it's not working as I want in all situations above:
(\d*[.]?[-]?\d+(?:\s*|\s+))(\w*)(\D)
. For example, groups are not build correct everytime and sometimes "g" lands in third group and sometimes in fourth..
Upvotes: 0
Views: 54
Reputation: 163477
The possibility of g landing in the third or the fourth group is due to the fact that \D
matches any char except a digit, which can also match chars a-z as \w
can.
So for example in this string 1ga
the g is in group 2. In this string 1g
the g is in group 3, as the word characters are optional and \D
expects at least a single char.
Note that this part of the pattern (?:\s*|\s+)
can be written as \s*
. You can use \s
in the pattern, but it can also possibly match a newline.
One option could be making the pattern a bit more specific and list the allowed special symbols in a character class [%]?
^(?:(\w+\.) )?(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?)([%]?)(?: ?(\w+))?
The pattern matches
^
Start of string(?:(\w+\.) )?
Optionally match a trailing space after capture group 1,
which matches 1+ word chars and a dot(
Capture group 2
\d+(?:\.\d+)?
Match 1+ digits with an optional decimal part(?:-\d+(?:\.\d+))?
Optionally match -
and 1+ digits with an optional decimal part)
Close group 2([%]?)
Capture group 3, match an optional "special" char(?: ?(\w+))?
Optionally match a space and capture group 4 to match 1+ word charactersWithout an anchor, you could also use word boundary \b
and if the dot at the beginning is not always there, you can make it optional \.?
\b(?:(\w+\.?) )?(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?)([%]?)(?: ?(\w+))?
Upvotes: 1