Reputation: 485
I have a certain feature "Location" from which I want to extract country.
The feature looks like:
data['Location'].head()
0 stockton, california, usa
1 edmonton, alberta, canada
2 timmins, ontario, canada
3 ottawa, ontario, canada
4 n/a, n/a, n/a
Name: Location, dtype: object
I want:
data['Country'].head(3)
0 usa
1 canada
2 canada
I've tried:
data['Country'] = data.Location.str.extract('(+[a-zA-Z])', expand=False)
data[['Location', 'Country']].sample(10)
which returns:
error: nothing to repeat at position 1
When I try to put the '[a-zA-Z]+' it gives me city.
Help would be appreciated. Thanks.
Upvotes: 1
Views: 70
Reputation: 6639
You can also use regex patterns:
df['Country'] = df['Location'].str.split('(,\s)(\w+)$', n = 1, expand = True)[2]
Output:
df['Country'].head(3)
Out[111]:
0 usa
1 canada
2 canada
Name: country, dtype: object
Upvotes: 2
Reputation: 1753
data['Country'] = data['Location'].apply(lambda row: str(row).split(',')[-1])
You may do this, df.apply applies a function across all rows, our lambda function extracts the country, and apply is only called on one column and saved into another
Upvotes: 1