Reputation: 375
I have a DataFrame and one of the columns contains a text from which I want to extract some information.
I have two words: 'Type' and 'Capacity', between them is a string of numeric and non-numeric characters + white spaces are possible - I want to save that to a new column.
This is my code
df['new'] = df['text'].apply(lambda x: re.search(r'Type (\w+) Capacity', x).group(1))
print (df['new'])
It doesn't give me errors, but prints out this:
Series([], Name: test, dtype: object)
I don't understand what is wrong. Thanks for help
Upvotes: 1
Views: 2596
Reputation: 626826
You can use
df['new'] = df['text'].str.extract(r'Type (\w+) Capacity')
The pandas.Series.str.extract
method will only return the captured values (those matched with parenthetical pattern parts).
You may also pass expand=True
if you want to make sure a data frame only is returned (or False
to get Series/Index/DataFrame), and if you have no matches on some rows, .fillna('')
may be useful.
Upvotes: 1