GNMO11
GNMO11

Reputation: 2259

Pandas split a column value to new column if list

I am still learning pandas and have a pandas dataframe with 2 columns as shown below:

actual label          pred label
     0                    -1
     0                    -1
     1           [1, 0.34496911461303364]
     1                    -1

What I would like to accomplish is if a value in 'pred label' is a list to take the first value in the list, in this case 1, and keep it in the column and then take the second value in the list and put it in its own column 'pred score'.

Upvotes: 1

Views: 867

Answers (4)

Bob Haffner
Bob Haffner

Reputation: 8493

Here's another approach

d = {"actual label" : [0,0,1,1], "pred label" : [-1,-1,[1, 0.34496911461303364],-1]}
df = pd.DataFrame(d)

Assuming "pred label" is of type object and using boolean indexing. Wondering if there's a better way to index into the list than what I'm doing here

filter = df["pred label"].str.len() == 2
df.loc[filter,"pred score"] = df[filter]["pred label"].tolist()[0][1]  
df.loc[filter,"pred label"] = df[filter]["pred label"].tolist()[0][0]
print(df)
   actual label pred label  pred score
0             0         -1         NaN
1             0         -1         NaN
2             1          1    0.344969
3             1         -1         NaN

Upvotes: 1

MangoHands
MangoHands

Reputation: 1161

It's probably not a great idea to store the DataFrame in this initial format in the first place if it can be avoided. Here is a solution:

import pandas as pd

df = pd.DataFrame({'actual_label' : [0,0,1,1],
                  'pred_label' : [-1,-1, [1, 0.34496911461303364], -1]})

def split_label(v):
  if isinstance(v, list):
    return pd.Series(v, index = ['pred_label', 'pred_score'])
  return pd.Series(v, index = ['pred_label'])

new_pred = df.pred_label.apply(split_label)
df_new = pd.concat([df.actual_label, new_pred], axis=1)

The final output looks like this:

   actual_label  pred_label  pred_score
0             0          -1         NaN
1             0          -1         NaN
2             1           1    0.344969
3             1          -1         NaN

Upvotes: 2

Alexander
Alexander

Reputation: 109546

You can use a list comprehension together with isinstance to test if the object in pred_label is a list.

df['pred score'] = [c[1] if isinstance(c, list) else None for c in df['pred label']]
df['pred label'] = [c[0] if isinstance(c, list) else c for c in df['pred label']]
>>> df
   actual label  pred label  pred score
0             0          -1         NaN
1             0          -1         NaN
2             1           1    0.344969
3             1          -1         NaN

Upvotes: 1

Zero
Zero

Reputation: 76927

Here's one way to achieve it

In [74]: df
Out[74]:
  actual label  pred label
0            0          -1
1            0          -1
2            1  [1, 0.344]
3            1          -1

Using apply check if value is list isinstance(x,list) and take the value, and then apply(pd.Series, 1) to split as columns

In [75]: (df['pred label'].apply(lambda x: x if isinstance(x,list) else [x, np.nan])
                          .apply(pd.Series, 1))
Out[75]:
   0      1
0 -1    NaN
1 -1    NaN
2  1  0.344
3 -1    NaN

You could assign these two columns back to df with columns ['pred-lab', 'pred-score']

In [76]: df[['pred-lab', 'pred-score']] = (df['pred label'].apply(lambda x: x if isinstance(x,list) else [x, np.nan])
                                                           .apply(pd.Series, 1))

Final df looks like

In [77]: df
Out[77]:
  actual label  pred label  pred-lab  pred-score
0            0          -1        -1         NaN
1            0          -1        -1         NaN
2            1  [1, 0.344]         1       0.344
3            1          -1        -1         NaN

Upvotes: 2

Related Questions