Reputation: 2565
I have a pandas dataframe with 7 columns and about 1000000 rows.
column1 column2 column3 column4 column5 column6 column7
0 0 0.361690 0.377105 0.361405 0.374822 0.001909 0.368755
1 1 0.367399 0.376820 0.338567 0.356552 0.068900 0.359834
2 2 0.357122 0.390237 0.353982 0.359121 0.036614 0.365116
3 3 0.364545 0.405652 0.360263 0.387953 0.070556 0.379603
Here is a very simple example of a custom function (for demonstration) i can use on only ONE column.
def customFunction(df):
if (df.mean() >= 0.5):
result = True
else:
result = False
return result
dataFrame["column8"] = dataFrame["column2"].rolling(window=2000).apply(customFunction)
Is there an efficient way to roll a window (of a certain size) on all columns of the dataframe and pass some other parameters as well. Something like this
def customRollingFunctionWithMultipleColumns(dataFrame1, dataFrame2):
dataFrame1 = functionToNormalizeData(dataFrame1)
dataFrame1["column8"] = dataFrame2["compareAgainst"]
dataFrame1["column9"] = np.where(((dataFrame1['column8'] <= dataFrame1['column2']) & (dataFrame1['column8'] >= dataFrame1['column3'])), 1, 0)
result = dataFrame1.column9.sum()
return result
dataFrame["column8"] = dataFrame.rolling(window=2000).apply(customRollingFunctionWithMultipleColumns(dataFrameWith2000Rows, dataFrame2))
Upvotes: 2
Views: 2941
Reputation: 210952
IIUC you can do this:
def customRollingFunctionWithMultipleColumns(df1, df2):
qry = "column2 <= @df2.compareAgainst and @df2.compareAgainst <= column3"
return (df.eval(qry)*1).sum()
df2 = pd.DataFrame({'compareAgainst':(df.column3 + df.column2)/2})
df2.loc[[0,3]] *= 2
In [84]: df.rolling(window=2).apply(lambda x: customRollingFunctionWithMultipleColumns(x, df2))
Out[84]:
column1 column2 column3 column4 column5 column6 column7
0 NaN NaN NaN NaN NaN NaN NaN
1 2.0 2.0 2.0 2.0 2.0 2.0 2.0
2 2.0 2.0 2.0 2.0 2.0 2.0 2.0
3 2.0 2.0 2.0 2.0 2.0 2.0 2.0
Upvotes: 3