Aggregate Pandas DataFrame Counts by Dates in Groupby

Question

I have a Pandas DataFrame with first and last date columns from which I have constructed a dr field with a list of dates inside the range:

_id     FID     first_seen    last_seen     dr
607abc  1925    2020-11-13    2021-04-22    DatetimeIndex(['2020-11-13', '2020-11-14', '20...])
608abd  1925    2021-03-05    2021-03-07    DatetimeIndex(['2021-03-05', '2021-03-06', '20...])
442xae  1331    2020-05-05    2021-04-22    DatetimeIndex(['2020-05-05', '2021-04022', '20...])
...

For each FID, for each date that occurs associated with that FID, I need to get the count of _id on that date. For instance, from the above, as follows:

FID     date          count(_id)
1925    2020-11-13    44
1925    2020-11-14    46
...
1925    2021-04-22    61
1331    2020-05-05    189
1331    2020-05-06    190
...

I know I need to group by FID first, and get the min of first_seen and the max of last_seen, then I get stuck...

BENY · Accepted Answer

Let us try explode

df.explode('dr').groupby(['FID','dr']).size()

Or we can try value_counts here

df.explode('dr').value_counts(['FID','dr'])

Aggregate Pandas DataFrame Counts by Dates in Groupby

Answers (1)

Related Questions