Python Extract text in middle of string

Question

I would like to extract the name of the item from the text.

fg['Product'] = pd.Series([' 5 Guys Greasy Burger 3/5LB (24) [51656]', '5 Guys Super Strawberry Shake - (3/4) OZ (9) [5645654], '5 Guys Giant Loaded Double Cheese Burger 1/2LB Buns - 8Z Cups (22) [564654]'])

What I need in the df column for analysis by product

fg['Product'] = 'Greasy Burger', 'Super Strawberry Shake', 'Giant Loaded Double Cheese Burger'

I have tried multiple things, but this got me the closest.

fg['Product'] = fg['Product'].str.strip('5 Guys').str.replace(r'$$d+$$')

But this isn't close to getting me there. The logic in the pattern appears to be strip '5 Guys' and then remove anything after the first numeric digit or the first hyphen '-'. Just can't figure it out.

Chris · Accepted Answer

You can apply the regex r"5 Guys ([A-Za-z\s]*)" to every entry, which selects the group after r"5 Guys " containing all alphabetical characters and spaces. Maybe you have to find a more sophisticated pattern if there are also names with a number in it. I used an online regex helper for easier pattern creation (e.g. regex101).

Full code example:

import pandas as pd
import re

regex_pattern = r"5 Guys ([A-Za-z\s]*)"

def find_name(full_string):
    match = re.search(regex_pattern, full_string)
    print(match[1])

s = pd.Series([' 5 Guys Greasy Burger 3/5LB (24) [51656]', '5 Guys Super Strawberry Shake - (3/4) OZ (9) [5645654]', '5 Guys Giant Loaded Double Cheese Burger 1/2LB Buns - 8Z Cups (22) [564654]'])
s.apply(lambda x: find_name(x))

Python Extract text in middle of string

Answers (2)

Related Questions