Muftawo Omar
Muftawo Omar

Reputation: 68

Create sparse dataframe from a pandas dataframe with list values

I have a pandas Dataframe of id, dates, and payments arranged like below. dates are recorded in months and correspond to the payments of the same index in the row.

ID          Dates                                  payments 
A    ['02-2010','05-2010']                        [45,50]
B    ['02-2010','04-2010','06-2010']              [42,48,52]
C    ['03-2010','04-2010','05-2010','06-2010']    [39,38,39,42]

I would like to create a sparse Dataframe from this, with ID and Date as my dimensions, filling cells with 0 when there's no payment for that month. results should look like this.

    '02-2010' '03-2010' '04-2010' '05-2010' '06-2010'
A      45         0        0         50        0
B      42         0        48        0         52
C      0          39       38        39        42

Upvotes: 0

Views: 125

Answers (2)

Muftawo Omar
Muftawo Omar

Reputation: 68

This worked for me.

#create dataframe
df_train = pd.DataFrame({ 'ID': ["A", "B","C"],
                    'Dates': [['02-2010','05-2010']  , ['02-2010','04-2010','06-2010']  , ['03-2010','04-2010','05-2010','06-2010']],
    
                   'payments': [[45,50],[42,48,52], [39,38,39,42]],
                   "m1":[45,58,74],
                   "m2":[4,8,7]
                   })

#explode and set ID as index
df=df.set_index(['ID']).apply(pd.Series.explode).reset_index()

#create pivot table 
df =df.pivot(index="ID", columns="Dates", values="payments").fillna(0)

print(df)

prints:

Dates  02-2010  03-2010  04-2010  05-2010  06-2010
ID                                                
A           45        0        0       50        0
B           42        0       48        0       52
C            0       39       38       39       42

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195408

Try:


# transform dates, payments to python list

#from ast import literal_eval

#df["Dates"] = df["Dates"].apply(literal_eval)
#df["payments"] = df["payments"].apply(literal_eval)



df = df.explode(["Dates", "payments"])
print(df.pivot(index="ID", columns="Dates", values="payments").fillna(0))

Prints:

Dates  02-2010  03-2010  04-2010  05-2010  06-2010
ID                                                
A           45        0        0       50        0
B           42        0       48        0       52
C            0       39       38       39       42

Upvotes: 2

Related Questions