Reputation: 68
I have a pandas Dataframe of id, dates, and payments arranged like below. dates are recorded in months and correspond to the payments of the same index in the row.
ID Dates payments
A ['02-2010','05-2010'] [45,50]
B ['02-2010','04-2010','06-2010'] [42,48,52]
C ['03-2010','04-2010','05-2010','06-2010'] [39,38,39,42]
I would like to create a sparse Dataframe from this, with ID and Date as my dimensions, filling cells with 0 when there's no payment for that month. results should look like this.
'02-2010' '03-2010' '04-2010' '05-2010' '06-2010'
A 45 0 0 50 0
B 42 0 48 0 52
C 0 39 38 39 42
Upvotes: 0
Views: 125
Reputation: 68
This worked for me.
#create dataframe
df_train = pd.DataFrame({ 'ID': ["A", "B","C"],
'Dates': [['02-2010','05-2010'] , ['02-2010','04-2010','06-2010'] , ['03-2010','04-2010','05-2010','06-2010']],
'payments': [[45,50],[42,48,52], [39,38,39,42]],
"m1":[45,58,74],
"m2":[4,8,7]
})
#explode and set ID as index
df=df.set_index(['ID']).apply(pd.Series.explode).reset_index()
#create pivot table
df =df.pivot(index="ID", columns="Dates", values="payments").fillna(0)
print(df)
prints:
Dates 02-2010 03-2010 04-2010 05-2010 06-2010
ID
A 45 0 0 50 0
B 42 0 48 0 52
C 0 39 38 39 42
Upvotes: 0
Reputation: 195408
Try:
# transform dates, payments to python list
#from ast import literal_eval
#df["Dates"] = df["Dates"].apply(literal_eval)
#df["payments"] = df["payments"].apply(literal_eval)
df = df.explode(["Dates", "payments"])
print(df.pivot(index="ID", columns="Dates", values="payments").fillna(0))
Prints:
Dates 02-2010 03-2010 04-2010 05-2010 06-2010
ID
A 45 0 0 50 0
B 42 0 48 0 52
C 0 39 38 39 42
Upvotes: 2