Reputation: 135
I have a task here I have a data frame containing data about visits in a particular site. Here's a sample:
visitsite | userid | timeonsite |
---|---|---|
facebook.com | kahy68 | 91973 |
facebook.com | jjsga12 | 2895 |
I need to create cohorts(groups) based on timeonsite(presented in seconds) column. I need also to calculate how many users are in each cohort and what is their share out of all users.
An output example:
visitdurationcohort | 1000-2000 | 2000-3000 | 3000-5000 | 5000+ |
---|---|---|---|---|
usersquantity | 1383 | 9973 | 3899 | 684 |
shareofusers | 7% | 60% | 30% | 3% |
So i found exampkes on how to create cohorts out of a specific value (a month of registartion for example), but not in how to create a range cohort.
I will apreciate any help :)
Upvotes: 0
Views: 49
Reputation: 584
As per @raymond-kwok:
bins = [0,1000,2000, 3000, 5000,10000]
df1 = df.groupby(pd.cut(df["timeonsite"], bins)).count()
df1 = df1[["userid"]]
df1["shareofusers"] = df1["userid"]/(df1["userid"].sum())
df1 = df1.T
Upvotes: 1