Reputation: 181
Good afternoon all, i have a dataframe below.
UserId Application
1 apple
1 orange
1 apple
1 pear
2 apple
2 orange
2 pear
2 grapefruit
3 apple
3 grapefruit
3 apple
1 apple
I am trying to create a list that counts every unique application to the percent of UserIDs that have them. As an example of the output the table is below
Application Percentage
apple 100
orange 66
pear 66
grapefruit 66
This output is telling me that for every user, apple appears 100 percent of the time orange appears 66 percent of the time. etc etc but somehow i cannot get this to work.
My code below works but produces 3.0 as a value.
dfsearch['Percentage'] = (len(dfsearch.Application.value_counts())/len(dfsearch.UserID.value_counts()))
dfsearch
this is probably incorrect because its not a list, but that is why i need help :)
Upvotes: 0
Views: 34
Reputation: 13998
Use groupby()
+ nunique()
:
dfsearch.groupby("Application").UserID.nunique()/dfsearch.UserID.nunique()
#Application
#apple 1.000000
#grapefruit 0.666667
#orange 0.666667
#pear 0.666667
#Name: UserId, dtype: float64
Upvotes: 0
Reputation: 9941
You can start with dropping duplicate records with drop_duplicates
, then call value_counts
, divide by the number of unique users and multiply by 100:
x = df.drop_duplicates()['Application'].value_counts() / len(df['UserId'].unique()) * 100
x
Output:
apple 100.000000
pear 66.666667
grapefruit 66.666667
orange 66.666667
Name: Application, dtype: float64
And then convert it to DataFrame:
x.astype(int).to_frame('Percentage').rename_axis('Application').reset_index()
Output:
Application Percentage
0 apple 100
1 pear 66
2 grapefruit 66
3 orange 66
Upvotes: 3