Reputation: 37
I have a dataframe that is similar to:
Date Name
2020-04-01 ABCD
2020-04-01 Test
2020-04-01 Run1
2020-04-02 Run1
2020-04-03 XXX1
2020-04-03 Test
I want to groupby date and enumerate the number of datapoints for that day. I also want a column for the cumulative count of that date for every datapoint. Essentially, the two columns will give a quick reference of data: on 4-15-20 scan 10 of 23. This is the desired result:
Date Name # Total Scans
2020-04-01 ABCD 1 3
2020-04-01 Test 2 3
2020-04-01 Run1 3 3
2020-04-02 Run1 1 1
2020-04-03 XXX1 1 2
2020-04-03 Test 2 2
So far I have:
>>>df["#"]=std.groupby(['Date']).cumcount()+1
Date Name #
2020-04-01 ABCD 1
2020-04-01 Test 2
2020-04-01 Run1 3
2020-04-02 Run1 1
2020-04-03 XXX1 1
2020-04-03 Test 2
However, I'm having trouble adding the last column without needing to iterate over the dataset. Everything I've read says iterating over a dataframe is a no-no and the size of the file causes tremendous lag testing, confirming iteration is a bad idea.
Can anyone give me input here without needing iteration? Thanks
Upvotes: 0
Views: 93
Reputation: 150785
Let's do:
groups = df.groupby('Date')
df['#'] = groups.cumcount() + 1
df['Total Scans'] = groups['Date'].transform('size')
output:
Date Name # Total Scans
0 2020-04-01 ABCD 1 3
1 2020-04-01 Test 2 3
2 2020-04-01 Run1 3 3
3 2020-04-02 Run1 1 1
4 2020-04-03 XXX1 1 2
5 2020-04-03 Test 2 2
Upvotes: 2