Reputation: 1010
I have the following dataframe, which has about 4000 Tickers and in Total about 2 million rows:
Ticker Date Rank
1 01/01/2000 5
1 01/02/2000 NaN
2 01/01/2000 4
2 01/02/2000 2
I now run the following code to carryforward the Rank column, which works totally fine.:
import pandas as pd
df= df.sort_values(by=["Ticker", "Date"], ascending=[True,True])
df['Rank'] = df.groupby('Ticker')['Rank'].fillna(value=None, method="ffill")
However, I now want to carryforward a different column. In order to create this column I do the following:
import numpy as np
df["Code"]=np.NaN
In this function I write some code that about 200 values will be replaced by 1 according to the date and ticker values in the df "add". This code worksand looks the following:
df["Code"][(df.Date == add) & (df["Ticker"] == column)] = 1
This makes my dataframe look like this:
Ticker Date Rank Code
1 01/01/2000 5 NaN
1 01/02/2000 NaN NaN
2 01/01/2000 4 1
2 01/02/2000 2 NaN
Now, I want to carryforward this column, but the code takes forever.
import pandas as pd
df= df.sort_values(by=["Ticker", "Date"], ascending=[True,True])
df['Code'] = df.groupby('Ticker')['Code'].fillna(value=None, method="ffill")
I have ran it for two days and my pc crashed. There must be some mistake here in the way I am doing things, because the above carryforward runs so fast and this one does not even finish. I checked the dtype of "Code" and it is float64.
Can anyone help?
Upvotes: 1
Views: 1801
Reputation: 862771
I try a bit improve your code with sort=False
to groupby
and DataFrameGroupBy.ffill
:
#convert column to datetime
df["Date"] = pd.to_datetime(df["Date"])
#ascending can be omit, because default value
df= df.sort_values(by=["Ticker", "Date"])
df['Rank'] = df.groupby('Ticker', sort=False)['Rank'].ffill()
add = '01/01/2000'
column = 2
df.loc[(df.Date == add) & (df["Ticker"] == column), "Code"] = 1
#sorting again is not necessary
df['Code'] = df.groupby('Ticker', sort=False)['Code'].ffill()
print (df)
Ticker Date Rank Code
0 1 2000-01-01 5.0 NaN
1 1 2000-01-02 5.0 NaN
2 2 2000-01-01 4.0 1.0
3 2 2000-01-02 2.0 1.0
Upvotes: 1