Vesnič
Vesnič

Reputation: 375

Match string between two words in DataFrame

I have a DataFrame and one of the columns contains a text from which I want to extract some information.

I have two words: 'Type' and 'Capacity', between them is a string of numeric and non-numeric characters + white spaces are possible - I want to save that to a new column.

This is my code

df['new'] = df['text'].apply(lambda x: re.search(r'Type (\w+) Capacity', x).group(1))
print (df['new'])

It doesn't give me errors, but prints out this:

Series([], Name: test, dtype: object)

I don't understand what is wrong. Thanks for help

Upvotes: 1

Views: 2596

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626826

You can use

df['new'] = df['text'].str.extract(r'Type (\w+) Capacity')

The pandas.Series.str.extract method will only return the captured values (those matched with parenthetical pattern parts).

You may also pass expand=True if you want to make sure a data frame only is returned (or False to get Series/Index/DataFrame), and if you have no matches on some rows, .fillna('') may be useful.

Upvotes: 1

Related Questions