Stanislav Jirák
Stanislav Jirák

Reputation: 485

How to extract certain string from a text?

I have a certain feature "Location" from which I want to extract country.

The feature looks like:

data['Location'].head()

0    stockton, california, usa
1    edmonton, alberta, canada
2     timmins, ontario, canada
3      ottawa, ontario, canada
4                n/a, n/a, n/a
Name: Location, dtype: object

I want:

data['Country'].head(3)

0   usa
1   canada
2   canada

I've tried:

data['Country'] = data.Location.str.extract('(+[a-zA-Z])', expand=False)
data[['Location', 'Country']].sample(10)

which returns:

error: nothing to repeat at position 1

When I try to put the '[a-zA-Z]+' it gives me city.

Help would be appreciated. Thanks.

Upvotes: 1

Views: 70

Answers (2)

Ankur Sinha
Ankur Sinha

Reputation: 6639

You can also use regex patterns:

df['Country'] = df['Location'].str.split('(,\s)(\w+)$', n = 1, expand = True)[2]

Output:

df['Country'].head(3)
Out[111]: 
0       usa
1    canada
2    canada
Name: country, dtype: object

Upvotes: 2

Imtinan Azhar
Imtinan Azhar

Reputation: 1753

data['Country'] = data['Location'].apply(lambda row: str(row).split(',')[-1])

You may do this, df.apply applies a function across all rows, our lambda function extracts the country, and apply is only called on one column and saved into another

Upvotes: 1

Related Questions