Alejandro L
Alejandro L

Reputation: 139

Extract strings in pandas series

I have the following pandas Series:

cfia_recalls_merged['title'].head()
0                                     One Ocean brand Sliced Smoked Wild Sockeye Salmon recalled due to Listeria monocytogenes
1                                                Pastene brand Green Olives Sliced recalled due to container integrity defects
2                                              Casa Italia brand Soppressata Piccante Salami recalled due to possible spoilage
3                                                                                Obiji brand Palm Oil recalled due to Sudan IV
4    One Degree Organic Foods brand Gluten Free Sprouted Rolled Oats recalled due to packaging integrity defects and rancidity
Name: title, dtype: object

I want to extract certain parts of each string and append to a new column. Example:

test = {'brand': ['One Ocean', 'Pastene', 'Casa Italia'], 'product': ['Sliced Smoked Wild Sockeye Salmon', 'Green Olives Sliced', 'Soppressata Piccante Salami'], 'hazard': ['Listeria monocytogenes', 'container integrity defects', 'possible spoilage']}
example = pd.DataFrame(test)
example

    brand         product                              hazard
0   One Ocean     Sliced Smoked Wild Sockeye Salmon    Listeria monocytogenes
1   Pastene       Green Olives Sliced                  container integrity defects
2   Casa Italia   Soppressata Piccante Salami          possible spoilage

Essentially my separator is "brand" and "due to"

How can I do this with regex and capture groups?

Any help is appreciated. Thank you in advance!

Upvotes: 1

Views: 40

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520898

You could use str.extract here:

cfia_recalls_merged['brand'] = cfia_recalls_merged['title'].str.extract(r'^(.*?) brand\b')
cfia_recalls_merged['product'] = cfia_recalls_merged['title'].str.extract(r'^.*? brand (.*?) recalled due to\b')
cfia_recalls_merged['hazard'] = cfia_recalls_merged['title'].str.extract(r'\brecalled due to (.*)$')

Upvotes: 1

Related Questions