Reputation: 2652
I'm trying to get counts of unique counts of unique values for a column in pandas dataframe.
Sample data bellow:
In [3]: df = pd.DataFrame([[1, 1], [2, 1], [3, 2], [4, 3], [5, 1]], columns=['AppointmentId', 'PatientId'])
In [4]: df
Out[4]:
AppointmentId PatientId
0 1 1
1 2 1
2 3 2
3 4 3
4 5 1
Actual dataset has over 50000 unique values of PatientId
. I want to visualize appointment count per patient, but simply grouping by PatientId
and getting sizes of groups doesn't work well for plotting, because that would be 50000 bars.
For that reason I'm trying to plot how many patients had a specific number of appointments plotted, instead of plotting number of appointments against PatientId
.
Based on sample data above I want to get something like this:
AppointmentCount PatientCount
0 1 2
1 3 3
I approach this by first grouping on PatientId
and getting group sizes, drop PatientId
, and group sizes, but I can't find a way to extract it after grouping.
In [24]: appointment_counts = df.groupby('PatientId').size()
In [25]: appointment_counts
Out[25]:
PatientId
1 3
2 1
3 1
dtype: int64
In [26]: type(appointment_counts)
Out[26]: pandas.core.series.Series
Upvotes: 0
Views: 1778
Reputation: 323226
After your groupby
adding value_counts
df.groupby('PatientId').size().value_counts()
Out[877]:
1 2
3 1
dtype: int64
Then you can add rename
df.groupby('PatientId').size().value_counts().reset_index().rename(columns={'index':'Aid',0:'Pid'})
Out[883]:
Aid Pid
0 1 2
1 3 1
Upvotes: 1