Reputation: 338
I have a dataframe like this:
ID | Date |
---|---|
01 | 2020-01-02 |
01 | 2020-01-03 |
02 | 2020-01-02 |
I need to create a new column, that for each specific ID and Date, gives me the number of rows that have the same ID but of an earlier date.
So the output of the example df will look like this
ID | Date | Count |
---|---|---|
01 | 2020-01-02 | 0 |
01 | 2020-01-03 | 1 |
02 | 2020-01-02 | 0 |
I have tried working with aux tables, and also with group by using a lambda function, but with no real idea how to continue
Upvotes: 0
Views: 414
Reputation: 150
This will create a new column with the count.
df['Date'] = pd.to_datetime(df['Date'])
df['Count'] = df.groupby('ID')['Date'].rank(ascending=True).astype(int) - 1
Upvotes: 5
Reputation: 323226
Let us try factorize
df['new'] = df.sort_values('Date').groupby('ID')['Date'].transform(lambda x : x.factorize()[0])
df
ID Date new
0 1 2020-01-02 0
1 1 2020-01-03 1
2 2 2020-01-02 0
Upvotes: 0
Reputation: 45
First you need to be sure that you are comparing dates.
df["Date"] = pd.to_datetime(df['Date'], format="%Y-%m-%d")
Then you can create new column called 'Count' iterating over each row using df.apply.
def count_earlier_dates(row):
return df[df['Date'] < row['Date']].count()['ID']
df['Count'] = df.apply(lambda row: count_earlier_dates(row), axis=1)
Upvotes: 2