Reputation: 129
Just started learning python and regex.
My regex:
\b\d+\s+([A-Za-z]* |[A-Za-z]*\s+[A-Za-z]*)\s+\D+..
using https://regex101.com/
string 1: https://i.imgur.com/XNuXftW.jpg (why does Beer has whitespaces while carrot/chocolate dont have?)
string 2https://i.imgur.com/nrl2FPB.jpg (adding further more of \s+[A-Za-z] in the capture group doesnt seem to be working anymore, WHY?)
string 3: https://i.imgur.com/qH0Z7Hi.jpg (same as string 2 problem)
my question is how do i continue to formulate such that it will encompass the above conditions? thank you
in the case that you need to test it yourself, i have provided the strings as below.
=
Upvotes: 0
Views: 58
Reputation: 106
I guess the the space before "|" is the one causes it captures "beer " in string 1 case "Chocolate cake" does not happen as "beer " as it is matched with the second regex which is
[A-Za-z]*\s+[A-Za-z]*
for string 2 [A-Za-z]\s+[A-Za-z] regex matches exactly two words
How about try below regex, modified from trincot
(?<=\s\s)(\w+\s)+(\w+)(?=\s\s)
Upvotes: 1
Reputation: 350300
You could use this regex, which takes advantage of look-behind (?<=
) and look-ahead (?=
) so it only captures the product names:
(?<=\s\s)\w+(?:\s\w+)*(?=\s\s)
See demo on regex101.com.
Use it with the g
modifier.
Upvotes: 1