Finding Distinct Count within Groups

Question

I have a pandas dataframe that is in the following format:

DATE        ID_1    ID_2
2017-01-20  J1234   1234567 
2017-01-20  K2345   2143567     
2017-01-21  K2345   1234567
2017-01-21  R2233   3840173
2017-01-21  J1234   9876543 
2017-01-21  J1234   0092861
2017-01-21  R2233   3792462
2017-01-22  J1234   3451628

I am trying to get a distinct count of how many ID_2's fall into each ID_1 for each date to ultimately plot the dates (x-axis) by distinct ID_2's in each ID_1 (y-axis). So the Dataframe to be plotted would look like this:

DATE        ID_1    Count_ID_2
2017-01-20  J1234   1   
2017-01-20  K2345   1
2017-01-21  K2345   1   
2017-01-21  R2233   2
2017-01-21  J1234   2
2017-01-22  J1234   1

With each ID_1 having a different line on the plot. Note that there are repeats in the ID_2 column. I am brand new to python and pandas and am trying to find the right code for this manipulation - I typically do this in excel, but the datafiles are so big now that it is way too slow. Thanks in advance for the help!

Scott Boston · Accepted Answer

Try using groupby with count:

df.groupby(['DATE','ID_1'], as_index=False)['ID_2'].count()

Output:

         DATE   ID_1  ID_2
0  2017-01-20  J1234     1
1  2017-01-20  K2345     1
2  2017-01-21  J1234     2
3  2017-01-21  K2345     1
4  2017-01-21  R2233     2
5  2017-01-22  J1234     1

Finding Distinct Count within Groups

Answers (2)

Related Questions