Reputation: 5021
I have a string for example:
s = 'Knorr 12x10g Fish bouillon cube'
I want to get 12x10g part using regex. The logic would be to find the first digit and extend it until find a first space. Right now I am just able to match only this specific string with the following regex.
val = re.findall(r'\s[0-9].x[0-9].g', s]
But I have kg, ml and other kinds of weights metrics in my data. So this regex not work with all. Any suggestions ? Thanks.
Upvotes: 1
Views: 83
Reputation: 6088
For regex:
\d+\w\d+\w*(?=\s)
Demo: https://regex101.com/r/1orSGQ/1
For Python
import re
text = '''s = 'Knorr 12x10g Fish bouillon cube'
s = 'Knorr 12x10kg Fish bouillon cube'
s = 'Knorr 12x10gram Fish bouillon cube'
'''
for m in re.finditer(r"\d+\w\d+\w*(?=\s)", text):
print('%s' % (m.group(0)))
Output
12x10g
12x10kg
12x10gram
Upvotes: 0
Reputation: 1127
\s[0-9]{1,}.x[0-9]{1,}[a-z]{1,}\s
After this, you can choose to use .strip()
to the derived string.
Upvotes: 1
Reputation: 626748
The logic would be to find the first digit and extend it until find a first space.
You may use \d\S*
regex:
import re
s = 'Knorr 12x10g Fish bouillon cube'
val = re.findall(r'\d\S*', s)
print(val)
See the Python demo
The re.findall
method will find all non-overlapping occurrences of substrings starting with a digit (\d
) with 0+ characters other than whitespace after it (\S*
). If the number of non-whitespaces should be non-zero, replace *
with +
(1 or more occurrences).
To avoid matching trailing punctuation, you may add \b
at the end of the regex pattern (r'\d\S*\b'
).
Upvotes: 2