Reputation: 1387
I have data like this
user product month
A 1011 2
B 2011 2
B 2034 3
C 2078 4
D 1098 1
A 9087 3
A 5017 4
D 1872 1
A 1011 4
and so on
I want to aggregate this data such that, convert month column into row, as in they become the headers, and I can aggregate 'product' with unique values under each user and in which month they were accessed. So output should be something like this
user product 1 2 3 4 5 6 7 8 9 10 11 12
A 3 0 1 1 2 0 0 0 0 0 0 0 0
B 2 0 1 1 0 0 0 0 0 0 0 0 0
C 1 0 0 0 1 0 0 0 0 0 0 0 0
D 2 2 0 0 0 0 0 0 0 0 0 0 0
I think I can use Pivot to get this done but I am not able to aggregate under the month header.
I want to go a step ahead, and divide the product with each month to find utilization rate each month, for example:
user product 1 2 3 4 5 6 7 8 9 10 11 12
A 3 0 0.33 0.33 0.66 0 0 0 0 0 0 0 0
B 2 0 0.50 0.50 0 0 0 0 0 0 0 0 0
C 1 0 0 0 1 0 0 0 0 0 0 0 0
D 2 1 0 0 0 0 0 0 0 0 0 0 0
Upvotes: 1
Views: 48
Reputation: 323306
Using groupby
+ nunqiue
get the first two columns, then we using get_dummies
get the others
s1=df.groupby('user').product.nunique()
s2=df.set_index('user').month.astype(str).str.get_dummies().sum(level=0)
pd.concat([s1,s2.div(s1,axis=0)],1)
Out[121]:
product 1 2 3 4
user
A 3 0.0 0.333333 0.333333 0.666667
B 2 0.0 0.500000 0.500000 0.000000
C 1 0.0 0.000000 0.000000 1.000000
D 2 1.0 0.000000 0.000000 0.000000
Upvotes: 1