Extracting a specific word using Regex in Pandas

Question

I'm trying to extract the name of the country from the following dataframe

country
0   NaN
1   Country: America
2   Country: France ...More CountriesFranceNorwayP...
3   NaN
4   Country: India

using the following regex statement

import re
regex = re.compile(\
    r"Country: (?P\w+)"
    )

df['country'] = df['country'].str.extractall(regex).droplevel(1)

However it returns

country
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN

Instead of returning

country
0   NaN
1   America
2   France
3   NaN
4   India

What am I missing out on?

Please Advise

Wiktor Stribiżew · Accepted Answer

You can use extract:

df['country'] = df['country'].str.extract(r'Country:\s*(\w+)')

Pandas test:

import pandas as pd
import numpy as np
df = pd.DataFrame({'country' : [np.nan, 'Country: America', 'Country France ... More countries...']})
df['country'].str.extract(r'Country:\s*(\w+)')
#          0
# 0      NaN
# 1  America
# 2      NaN

Extracting a specific word using Regex in Pandas

Answers (2)

Related Questions