Reputation: 1133
I have a pandas df as follows:
Value1 Value2 Label
15.1 12 0
17 5 1
19 2 1
I am looking to build a new df, such that each row contains thee input of the previous n
rows. For example if n=2
my output should be
Value1.1 Value2.1 Value1.2 Value2.2 Value1 Value2 Label
15.1 12 17 5 19 2 1
This is the third row and has a label=1
, the Value1
and Value2
of the previous 2
rows are appended to the third row.
Any thoughts on how I can achieve this in python?
Thanks!
Upvotes: 0
Views: 52
Reputation: 18367
Maybe this helps you get the job done, in this case I'm defining the variable n
as the amount of rows you want to use:
import pandas as pd
n = 2
aux = sorted(list(range(1,n+1))*n)
df = pd.DataFrame({'Value1':[15.1,17,19],
'Value2':[12,5,2],
'Label':[0,1,1]})
cols = [(list(df)[:n]*n)[x]+"."+str(aux[x]) for x in range(len(aux))]
df_out = pd.DataFrame({'vals':df.iloc[:n,:-1].values.flatten().tolist()+df.iloc[n,:].values.tolist()},
index = cols + list(df)).T
The output is as expected:
Value1.1 Value2.1 Value1.2 Value2.2 Value1 Value2 Label
vals 15.1 12.0 17.0 5.0 19.0 2.0 1.0
Upvotes: 1
Reputation: 26221
Perhaps something like:
n = 2
sel = [k for k in df.columns if k != 'Label']
df2 = df
for k in range(1, n + 1):
df2 = df2.join(df[sel].shift(k), rsuffix=f'.{k}')
print(df2)
Value1 Value2 Label Value1.1 Value2.1 Value1.2 Value2.2
0 15.1 12 0 NaN NaN NaN NaN
1 17.0 5 1 15.1 12.0 NaN NaN
2 19.0 2 1 17.0 5.0 15.1 12.0
Or, if you prefer the column order you indicated in your example:
df2 = df
for k in range(1, n+1):
df2 = df[sel].shift(k).join(df2, lsuffix=f'.{k}')
print(df2)
Value1.2 Value2.2 Value1.1 Value2.1 Value1 Value2 Label
0 NaN NaN NaN NaN 15.1 12 0
1 NaN NaN 15.1 12.0 17.0 5 1
2 15.1 12.0 17.0 5.0 19.0 2 1
Upvotes: 1