mpwill
mpwill

Reputation: 65

Finding Distinct Count within Groups

I have a pandas dataframe that is in the following format:

DATE        ID_1    ID_2
2017-01-20  J1234   1234567 
2017-01-20  K2345   2143567     
2017-01-21  K2345   1234567
2017-01-21  R2233   3840173
2017-01-21  J1234   9876543 
2017-01-21  J1234   0092861
2017-01-21  R2233   3792462
2017-01-22  J1234   3451628

I am trying to get a distinct count of how many ID_2's fall into each ID_1 for each date to ultimately plot the dates (x-axis) by distinct ID_2's in each ID_1 (y-axis). So the Dataframe to be plotted would look like this:

DATE        ID_1    Count_ID_2
2017-01-20  J1234   1   
2017-01-20  K2345   1
2017-01-21  K2345   1   
2017-01-21  R2233   2
2017-01-21  J1234   2
2017-01-22  J1234   1

With each ID_1 having a different line on the plot. Note that there are repeats in the ID_2 column. I am brand new to python and pandas and am trying to find the right code for this manipulation - I typically do this in excel, but the datafiles are so big now that it is way too slow. Thanks in advance for the help!

Upvotes: 3

Views: 49

Answers (2)

BENY
BENY

Reputation: 323226

Try with value_counts PS: New in pandas can accept two more columns

df.value_counts(['DATE','ID_1'])#.reset_index()

Out[9]: 
DATE        ID_1 
2017-01-21  R2233    2
            J1234    2
2017-01-22  J1234    1
2017-01-21  K2345    1
2017-01-20  K2345    1
            J1234    1
dtype: int64

Upvotes: 5

Scott Boston
Scott Boston

Reputation: 153460

Try using groupby with count:

df.groupby(['DATE','ID_1'], as_index=False)['ID_2'].count()

Output:

         DATE   ID_1  ID_2
0  2017-01-20  J1234     1
1  2017-01-20  K2345     1
2  2017-01-21  J1234     2
3  2017-01-21  K2345     1
4  2017-01-21  R2233     2
5  2017-01-22  J1234     1

Upvotes: 4

Related Questions