Reputation: 3
I have two dataframe
df1# - This has userid and the products bought by the corresponding user.
Userid, Productid, Ordered_Qty
user1 1 1
user1 2 1
user1 3 1
user2 4 1
user2 5 1
user2 3 1
user3 2 1
user3 4 1
user3 5 1
df2# - This has product master
Productid, some_attribute
1 "xsff"
2 "xsff"
3 "xsff"
4 "xsff"
5 "xsff"
Now, I would like to create new dataframe that looks like below,
Row - Userid from df1
columns - Productid from df2
New dataframe# - [i,j] value corresponds to the df1# Ordered_Qty(any qty).Basically I wanted to have all the values from df2 - Productid but fill the values in new dataframe from the df1. if df1 doesn't have entry for userid(row) & product(col) combination then, I like to keep zero in it
1 2 3 4 5
user1 1 1 1 0 0
user2 0 0 1 1 1
user3 0 1 0 1 1
To achieve this, What I did is
new_dataframe = pd.dataframe(0,index=df1.Userid.unique(),columns=df2.Productid.unique())
for index,row in df1.iterrows():
userid = row['Userid']
productid = row['Productid']
ord_qty = row['Ordered_Qty']
new_dataframe.loc[userid][productid] = ord_qty
However this way of doing taking more time and eventually resulting in memory error. Is there better way to do it?
Upvotes: 0
Views: 69
Reputation: 13255
Use pivot
or pivot_table
or crosstab
:
df3 = df.pivot(index='Userid', columns='Productid').fillna(0, downcast='infer')
df3.columns = df3.columns.get_level_values(1)
Or:
df3 = df.pivot_table(index='Userid', columns='Productid', fill_value=0)
df3.columns = df3.columns.get_level_values(1)
Or:
df3 = pd.crosstab(df['Userid'], df['Productid'])
And then:
df3 = df3.reindex(columns=df2.Productid.unique(), fill_value=0)
Upvotes: 1
Reputation: 1081
Please use Crosstab:
pandas.crosstab(df['userid'], df['productid'])
Upvotes: 1