DanCor
DanCor

Reputation: 338

Obtain number of rows with the same ID but with an earlier date

I have a dataframe like this:

ID Date
01 2020-01-02
01 2020-01-03
02 2020-01-02

I need to create a new column, that for each specific ID and Date, gives me the number of rows that have the same ID but of an earlier date.

So the output of the example df will look like this

ID Date Count
01 2020-01-02 0
01 2020-01-03 1
02 2020-01-02 0

I have tried working with aux tables, and also with group by using a lambda function, but with no real idea how to continue

Upvotes: 0

Views: 414

Answers (3)

Sanjay Pant
Sanjay Pant

Reputation: 150

This will create a new column with the count.

df['Date'] = pd.to_datetime(df['Date'])
df['Count'] = df.groupby('ID')['Date'].rank(ascending=True).astype(int) - 1

Upvotes: 5

BENY
BENY

Reputation: 323226

Let us try factorize

df['new'] = df.sort_values('Date').groupby('ID')['Date'].transform(lambda x : x.factorize()[0])
df
   ID        Date  new
0   1  2020-01-02    0
1   1  2020-01-03    1
2   2  2020-01-02    0

Upvotes: 0

Cristobal Ignacio
Cristobal Ignacio

Reputation: 45

First you need to be sure that you are comparing dates.

df["Date"] = pd.to_datetime(df['Date'], format="%Y-%m-%d")

Then you can create new column called 'Count' iterating over each row using df.apply.

def count_earlier_dates(row):
    return df[df['Date'] < row['Date']].count()['ID']

df['Count'] = df.apply(lambda row: count_earlier_dates(row), axis=1)

Upvotes: 2

Related Questions