muazfaiz
muazfaiz

Reputation: 5021

Matching string on a condition using regex

I have a string for example:

s = 'Knorr 12x10g Fish bouillon cube'

I want to get 12x10g part using regex. The logic would be to find the first digit and extend it until find a first space. Right now I am just able to match only this specific string with the following regex.

val = re.findall(r'\s[0-9].x[0-9].g', s]

But I have kg, ml and other kinds of weights metrics in my data. So this regex not work with all. Any suggestions ? Thanks.

Upvotes: 1

Views: 83

Answers (3)

Ibrahim
Ibrahim

Reputation: 6088

For regex:

\d+\w\d+\w*(?=\s)

Demo: https://regex101.com/r/1orSGQ/1


For Python

import re
text = '''s = 'Knorr 12x10g Fish bouillon cube'
s = 'Knorr 12x10kg Fish bouillon cube'
s = 'Knorr 12x10gram Fish bouillon cube'
'''

for m in re.finditer(r"\d+\w\d+\w*(?=\s)", text):

    print('%s' % (m.group(0)))

Output

12x10g
12x10kg
12x10gram

Upvotes: 0

vsr
vsr

Reputation: 1127

\s[0-9]{1,}.x[0-9]{1,}[a-z]{1,}\s

After this, you can choose to use .strip() to the derived string.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

The logic would be to find the first digit and extend it until find a first space.

You may use \d\S* regex:

import re
s = 'Knorr 12x10g Fish bouillon cube'
val = re.findall(r'\d\S*', s)
print(val)

See the Python demo

The re.findall method will find all non-overlapping occurrences of substrings starting with a digit (\d) with 0+ characters other than whitespace after it (\S*). If the number of non-whitespaces should be non-zero, replace * with + (1 or more occurrences).

To avoid matching trailing punctuation, you may add \b at the end of the regex pattern (r'\d\S*\b').

Upvotes: 2

Related Questions