Reputation: 65
I have a pandas dataframe that is in the following format:
DATE ID_1 ID_2
2017-01-20 J1234 1234567
2017-01-20 K2345 2143567
2017-01-21 K2345 1234567
2017-01-21 R2233 3840173
2017-01-21 J1234 9876543
2017-01-21 J1234 0092861
2017-01-21 R2233 3792462
2017-01-22 J1234 3451628
I am trying to get a distinct count of how many ID_2's fall into each ID_1 for each date to ultimately plot the dates (x-axis) by distinct ID_2's in each ID_1 (y-axis). So the Dataframe to be plotted would look like this:
DATE ID_1 Count_ID_2
2017-01-20 J1234 1
2017-01-20 K2345 1
2017-01-21 K2345 1
2017-01-21 R2233 2
2017-01-21 J1234 2
2017-01-22 J1234 1
With each ID_1 having a different line on the plot. Note that there are repeats in the ID_2 column. I am brand new to python and pandas and am trying to find the right code for this manipulation - I typically do this in excel, but the datafiles are so big now that it is way too slow. Thanks in advance for the help!
Upvotes: 3
Views: 49
Reputation: 323226
Try with value_counts
PS: New in pandas can accept two more columns
df.value_counts(['DATE','ID_1'])#.reset_index()
Out[9]:
DATE ID_1
2017-01-21 R2233 2
J1234 2
2017-01-22 J1234 1
2017-01-21 K2345 1
2017-01-20 K2345 1
J1234 1
dtype: int64
Upvotes: 5
Reputation: 153460
Try using groupby
with count
:
df.groupby(['DATE','ID_1'], as_index=False)['ID_2'].count()
Output:
DATE ID_1 ID_2
0 2017-01-20 J1234 1
1 2017-01-20 K2345 1
2 2017-01-21 J1234 2
3 2017-01-21 K2345 1
4 2017-01-21 R2233 2
5 2017-01-22 J1234 1
Upvotes: 4