Reputation: 61
I'm working on some code that generates features from a dataframe and adds these features as columns to the dataframe.
The trouble is I'm working with a time series so that for any given tuple, I need (let's say) 5 of the previous tuples to generate the corresponding feature for that tuple.
lookback_period = 5
df['feature1'] = np.zeros(len(df)) # preallocate
for index, row in df.iterrows():
if index < lookback_period:
continue
slice = df[index - lookback_period:index]
some_int = SomeFxn(slice)
row['feature1'] = some_int
Is there a way to execute this code without explicitly looping through every row and then slicing?
One way is to create several lagged columns using df['column_name'].shift()
such that all the necessary information is contained in each row, but this quickly gets intractable for my computer's memory since the dataset is large (millions of rows).
Upvotes: 3
Views: 87
Reputation: 1151
I don't have enough reputation to comment so will just post it here.
Can't you use apply for your dataframe e.g.
df['feature1'] = df.apply(someRowFunction, axis=1)
where someRowFunction will accept the full row and you can perform whatever row based slice and logic you want to do.
--- updated ---
As we do not have much information about the dataframe and the required/expected output, I just based the answer on the information from the comments
Let's define a function that will take a DataFrame slice (based on current row index and lookback) and the row and will return sum of the first column of the slice and value of the current row.
def someRowFunction (slice, row):
if slice.shape[0] == 0:
return 0
return slice[slice.columns[0]].sum() + row.b
d={'a':[1,2,3,4,5,6,7,8,9,0],'b':[0,9,8,7,6,5,4,3,2,1]}
df=pd.DataFrame(data=d)
lookback = 5
df['c'] = df.apply(lambda current_row: someRowFunction(df[current_row.name -lookback:current_row.name],current_row),axis=1)
we can get row index from apply using its name attribute and as such we can retrieve the required slice. Above will result to the following
print(df)
a b c
0 1 0 0
1 2 9 0
2 3 8 0
3 4 7 0
4 5 6 0
5 6 5 20
6 7 4 24
7 8 3 28
8 9 2 32
9 0 1 36
Upvotes: 1