regex for line of string

Question

Hi I try to extract some information from a few lines in python with a regular expression. What I have now is: ([a-zA-Z()]+\S\S) My lines are:

Butter 100mg x 12
Butter Organic Jelly 100mg x 7
Butter Soft 100mg x 12
3.5g Organic White Loofi
10g Bubblegum
10 x TST Butter 200yg Hofmann
100 x 10mg Jelly (Test)

With the regex above I get the strings Butter, Butter, Organic, Jelly, Butter, Soft, Organic, White, Loofi, Bubblegum, TST, Butter, Jelly, (Test). But I want the string from every line like: Butter, Butter Organic Jelly, Butter Soft, etc. Not seperated from each other. What do I do wrong?

Flavian Hautbois · Accepted Answer

This regex works for you particular cases: ([A-Z][a-z][A-Za-z()\s]+[a-z)])

What it says is, find a string where:

the first character is an uppercase char (used to get rid of mg)
the second a lowercase char (it is used to reject TST Butter and only keep Butter and not TST), then
then 0 or more of uppercase, lowercase, parentheses or whitespace
the last character is a closing parenthesis or a lowercase char.

This gives me the following matches:

Butter
Butter Organic Jelly
Butter Soft
Organic White Loofi
Bubblegum
Butter
Hofmann
Jelly (Test)

regex for line of string

Answers (2)

Related Questions