Reputation: 53
I have dataframe with one of column:
data['countries']
"[{'iso_3166_1': 'KR', 'name': 'South Korea'}]"
"[{'iso_3166_1': 'US', 'name': 'United States of America'}]"
How can extract ONLY country names: 'South Korea','United States of America'
etc.
Upvotes: 0
Views: 78
Reputation: 39800
import json
import numpy as np
countries = [ json.loads(c.replace("'", '"')) for c in data['countries'] if not np.isnan(c)]
country_names = [cn for cn[0]['name'] in countries]
And the output will be:
>>> ['South Korea', 'United States of America']
Upvotes: 2
Reputation: 3770
this should work
data['countries'] = data['countries'].apply(lambda x: eval(x))
data['countries'].apply(lambda x: x[0]['name'])
Output
0 South Korea
1 United States of America
Name: 1, dtype: object
list(data[1].apply(lambda x: x[0]['name']))
Output
['South Korea', 'United States of America']
Upvotes: 0
Reputation: 10779
If you don't want to change your DataFrame but just parse the content of the string it contains, you could use split.
>>> a = "[{'iso_3166_1': 'KR', 'name': 'South Korea'}]"
>>> a.split("'name': ")[1].split("'")[1]
'South Korea'
or:
def f(a):
return a.split("'name': ")[1].split("'")[1]
countries = [f(a) for a in data['countries']]
Upvotes: 1