alladinsane
alladinsane

Reputation: 185

Pandas counting rows with dates

I have a DataFrame like this:

      Date         X       Y
0  2002-01-01     ...     ...
1  2002-01-01     ...     ...
2  2002-01-03     ...     ...
3  2002-01-04     ...     ...
4  2002-01-04     ...     ...
5  2002-01-04     ...     ...

My goal is to get an additional column that count rows with the same dates and also drops duplicates:

      Date         X       Y      Count
0  2002-01-01     ...     ...       2
1  2002-01-03     ...     ...       1
2  2002-01-04     ...     ...       3

I've read a few posts and tried .unique, .size(), .transform(), .value_counts(), but none of them helped me through. Even simple .drop_duplicates(subset='Date') doesn't work.

Edit: the Date column was created with .dt.date.

Upvotes: 1

Views: 5928

Answers (2)

kantal
kantal

Reputation: 2407

Try it:

a=df.groupby("Date").size().values
df= df.drop_duplicates(subset="Date").assign(Count=a)

Upvotes: 3

Valdi_Bo
Valdi_Bo

Reputation: 30971

Start from computing how many times each date occurs:

cnt = df.groupby('Date').size().rename('Count')

The name given to cnt will be needed as the name of the respective column in the result.

Then compute the result:

result = df.drop_duplicates(subset='Date')\
    .merge(cnt, left_on='Date', right_index=True)

The steps are:

  • Drop duplicates (by default the first row is retained).
  • Add Count column from cnt. Index values from cnt (dates) are matched with Date column.

Upvotes: 1

Related Questions