Reputation: 8291
I have this dataset:
userid sub_id event
1 NaN {'score':25, 'sub_id':5}
1 5 {'score':1}
When sub_id
column is NaN, I want to extract this information from event
column using the following code:
df['sub_id'] = df.apply(lambda row:
row['event'].split('sub_id')[1]
if pd.isnull(row['sub_id'])
else row['sub_id'])
However, I am receiving this error: KeyError: ('sub_id', u'occurred at index index')
I am trying to obtain this dataframe:
userid sub_id event
1 5 {'score':25, 'sub_id':5}
1 5 {'score':1}
Any idea for the error, or any suggestions for a different solution?
UPDATE
I need to extract the value that resides in a nested dict element:
event
{u'POST': {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}, u'GET': {}}
I am using this code:
df['POST'] = df['event'].apply(pd.Series)['POST']
which creates the following column:
POST
{u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}
However, I need to get the overall_feedback
value. Because of the formatting of the POST
field, the following code does not work:
df['POST'].apply(pd.Series)['overall_feedback']
It throws this error KeyError: 'overall_feedback'
Any ideas?
Upvotes: 2
Views: 93
Reputation: 863166
You can use combine_first
or fillna
:
print (type(df.loc[0, 'event']))
<class 'dict'>
df['sub_id'] = df['sub_id'].combine_first(df.event.apply(lambda x: x['score']))
#df['sub_id'] = df['sub_id'].fillna(df.event.apply(lambda x: x['score']))
print (df)
event sub_id userid
0 {'sub_id': 5, 'score': 5} 5.0 1
1 {'score': 1} 5.0 1
EDIT: If nested dict, faster solutio is use double DataFame
constructor and slowier solution double apply
with Series
:
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[{'post':{'score':25, 'sub_id':5}},{'post':{'score':1}} ]})
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} NaN 1
1 {'post': {'score': 1}} 5.0 1
s = pd.DataFrame(pd.DataFrame(df['event'].values.tolist())['post'].values.tolist())['score']
print (s)
0 25
1 1
Name: score, dtype: int64
s = df['event'].apply(pd.Series)['post'].apply(pd.Series)['score']
print (s)
0 25.0
1 1.0
Name: score, dtype: float64
df['sub_id'] = df['sub_id'].combine_first(s)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} 25.0 1
1 {'post': {'score': 1}} 5.0 1
EDIT1:
For converting to dict is possible use:
import ast, yaml
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[{'post':{'score':25, 'sub_id':5}},{'post':{'score':1}} ]})
df.event = df.event.astype(str)
print (type(df.loc[0, 'event']))
<class 'str'>
df['event'] = df['event'].apply(ast.literal_eval)
#df['event'] = df['event'].apply(yaml.load)
print (df)
event sub_id userid
0 {'post': {'sub_id': 5, 'score': 25}} NaN 1
1 {'post': {'score': 1}} 5.0 1
print (type(df.loc[0, 'event']))
<class 'dict'>
EDIT2:
d = {u'{"options_selected":{"Ideas":"0"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_1"}': [u'']}
d1 = {u'{"options_selected":{"Ideas":"2"},"criterion_feedback":{},"overall_feedback":"Feedback_text_goes_here_2"}': [u'']}
df = pd.DataFrame({'userid':[1,1],
'sub_id':[np.nan, 5],
'event':[d,d1]})
df['event'] = df['event'].astype(str).apply(yaml.load).apply(lambda x : yaml.load(list(x.keys())[0]))
print (type(df.event.iloc[0]))
<class 'dict'>
print (df.event.apply(pd.Series)['overall_feedback'])
0 Feedback_text_goes_here_1
1 Feedback_text_goes_here_2
Name: overall_feedback, dtype: object
Upvotes: 2