Reputation: 2259
I am still learning pandas and have a pandas dataframe with 2 columns as shown below:
actual label pred label
0 -1
0 -1
1 [1, 0.34496911461303364]
1 -1
What I would like to accomplish is if a value in 'pred label' is a list to take the first value in the list, in this case 1, and keep it in the column and then take the second value in the list and put it in its own column 'pred score'.
Upvotes: 1
Views: 867
Reputation: 8493
Here's another approach
d = {"actual label" : [0,0,1,1], "pred label" : [-1,-1,[1, 0.34496911461303364],-1]}
df = pd.DataFrame(d)
Assuming "pred label" is of type object and using boolean indexing. Wondering if there's a better way to index into the list than what I'm doing here
filter = df["pred label"].str.len() == 2
df.loc[filter,"pred score"] = df[filter]["pred label"].tolist()[0][1]
df.loc[filter,"pred label"] = df[filter]["pred label"].tolist()[0][0]
print(df)
actual label pred label pred score
0 0 -1 NaN
1 0 -1 NaN
2 1 1 0.344969
3 1 -1 NaN
Upvotes: 1
Reputation: 1161
It's probably not a great idea to store the DataFrame in this initial format in the first place if it can be avoided. Here is a solution:
import pandas as pd
df = pd.DataFrame({'actual_label' : [0,0,1,1],
'pred_label' : [-1,-1, [1, 0.34496911461303364], -1]})
def split_label(v):
if isinstance(v, list):
return pd.Series(v, index = ['pred_label', 'pred_score'])
return pd.Series(v, index = ['pred_label'])
new_pred = df.pred_label.apply(split_label)
df_new = pd.concat([df.actual_label, new_pred], axis=1)
The final output looks like this:
actual_label pred_label pred_score
0 0 -1 NaN
1 0 -1 NaN
2 1 1 0.344969
3 1 -1 NaN
Upvotes: 2
Reputation: 109546
You can use a list comprehension together with isinstance
to test if the object in pred_label
is a list.
df['pred score'] = [c[1] if isinstance(c, list) else None for c in df['pred label']]
df['pred label'] = [c[0] if isinstance(c, list) else c for c in df['pred label']]
>>> df
actual label pred label pred score
0 0 -1 NaN
1 0 -1 NaN
2 1 1 0.344969
3 1 -1 NaN
Upvotes: 1
Reputation: 76927
Here's one way to achieve it
In [74]: df
Out[74]:
actual label pred label
0 0 -1
1 0 -1
2 1 [1, 0.344]
3 1 -1
Using apply
check if value is list isinstance(x,list)
and take the value, and then apply(pd.Series, 1)
to split as columns
In [75]: (df['pred label'].apply(lambda x: x if isinstance(x,list) else [x, np.nan])
.apply(pd.Series, 1))
Out[75]:
0 1
0 -1 NaN
1 -1 NaN
2 1 0.344
3 -1 NaN
You could assign these two columns back to df
with columns ['pred-lab', 'pred-score']
In [76]: df[['pred-lab', 'pred-score']] = (df['pred label'].apply(lambda x: x if isinstance(x,list) else [x, np.nan])
.apply(pd.Series, 1))
Final df
looks like
In [77]: df
Out[77]:
actual label pred label pred-lab pred-score
0 0 -1 -1 NaN
1 0 -1 -1 NaN
2 1 [1, 0.344] 1 0.344
3 1 -1 -1 NaN
Upvotes: 2