Groupby count to printed to individual row

Question

I have a dataframe that is similar to:

Date        Name
2020-04-01  ABCD
2020-04-01  Test
2020-04-01  Run1
2020-04-02  Run1
2020-04-03  XXX1
2020-04-03  Test

I want to groupby date and enumerate the number of datapoints for that day. I also want a column for the cumulative count of that date for every datapoint. Essentially, the two columns will give a quick reference of data: on 4-15-20 scan 10 of 23. This is the desired result:

Date        Name    #   Total Scans
2020-04-01  ABCD    1   3
2020-04-01  Test    2   3
2020-04-01  Run1    3   3
2020-04-02  Run1    1   1
2020-04-03  XXX1    1   2
2020-04-03  Test    2   2

So far I have:

>>>df["#"]=std.groupby(['Date']).cumcount()+1

Date        Name    #
2020-04-01  ABCD    1
2020-04-01  Test    2
2020-04-01  Run1    3
2020-04-02  Run1    1
2020-04-03  XXX1    1
2020-04-03  Test    2

However, I'm having trouble adding the last column without needing to iterate over the dataset. Everything I've read says iterating over a dataframe is a no-no and the size of the file causes tremendous lag testing, confirming iteration is a bad idea.

Can anyone give me input here without needing iteration? Thanks

Quang Hoang · Accepted Answer

Let's do:

groups = df.groupby('Date')
df['#'] = groups.cumcount() + 1
df['Total Scans'] = groups['Date'].transform('size')

output:

         Date  Name  #  Total Scans
0  2020-04-01  ABCD  1            3
1  2020-04-01  Test  2            3
2  2020-04-01  Run1  3            3
3  2020-04-02  Run1  1            1
4  2020-04-03  XXX1  1            2
5  2020-04-03  Test  2            2

Groupby count to printed to individual row

Answers (1)

Related Questions