FunnyChef
FunnyChef

Reputation: 1946

Python pandas, aggregate values by date in a dataframe

I have a pandas data frame that contains thousands of dates and IDs as follows:

2/1/18  123
2/1/18  123
2/1/18  456
2/1/18  789

I also have a short list of only a few IDs such as:

 ids=['123','456','909']

I need to loop through the list and get a count of each value in the list by date. So the result should look like;

2/1/18 123 2
       456 1
       909 0  

I can do a groupby easy enough on the data frame but that will return all ids and I only need the values from the id list.

Upvotes: 2

Views: 1670

Answers (2)

BENY
BENY

Reputation: 323226

I am using reindex here

df.loc[df.Id.isin(ids)].groupby('Date').Id.value_counts().reindex(index=pd.MultiIndex.from_product([df.Date.unique(),ids]),fill_value=0)
Out[1116]: 
2/1/18  123    2
        456    1
        909    0
Name: Id, dtype: int64

Upvotes: 1

sjw
sjw

Reputation: 6543

Is this what you're after?

import pandas as pd

df = pd.DataFrame([['2/1/18', 123],
                  ['2/1/18', 123],
                  ['2/1/18', 456],
                  ['2/1/18', 789]],
                  columns=['Date', 'ID'])

ids = ['123','456','909']

df['count'] = 1
results = df[df['ID'].isin(ids)].groupby(['Date', 'ID']).count()

Upvotes: 4

Related Questions