Reputation: 185
I have a DataFrame like this:
Date X Y
0 2002-01-01 ... ...
1 2002-01-01 ... ...
2 2002-01-03 ... ...
3 2002-01-04 ... ...
4 2002-01-04 ... ...
5 2002-01-04 ... ...
My goal is to get an additional column that count rows with the same dates and also drops duplicates:
Date X Y Count
0 2002-01-01 ... ... 2
1 2002-01-03 ... ... 1
2 2002-01-04 ... ... 3
I've read a few posts and tried .unique
, .size()
, .transform()
, .value_counts()
, but none of them helped me through. Even simple .drop_duplicates(subset='Date')
doesn't work.
Edit: the Date
column was created with .dt.date
.
Upvotes: 1
Views: 5928
Reputation: 2407
Try it:
a=df.groupby("Date").size().values
df= df.drop_duplicates(subset="Date").assign(Count=a)
Upvotes: 3
Reputation: 30971
Start from computing how many times each date occurs:
cnt = df.groupby('Date').size().rename('Count')
The name given to cnt will be needed as the name of the respective column in the result.
Then compute the result:
result = df.drop_duplicates(subset='Date')\
.merge(cnt, left_on='Date', right_index=True)
The steps are:
Upvotes: 1