Reputation: 119
I have data in a Dataframe structured like so...
D K Factor p
0 -0.483128 -1.240024 -1.214765 -1.002418
1 -0.692334 -1.632132 1.562630 0.997304
2 -1.189383 -1.632132 1.562630 0.997304
3 -1.691841 -1.632132 1.562630 0.997304
4 -2.084926 -1.632132 1.562630 0.997304
I'm trying to reorganize the data into a new structure where each row contains 'period' number of rows from the existing data. Moves ahead one row in existing and the stacks the next 'period lines'.
the function so far:
def prepData(seq, period):
newStacks = pd.DataFrame()
for pos in range(0, len(seq) - (period+1), 1):
chunk = (seq[pos:pos + period])
stack = []
for sliver in range(0, len(chunk), 1):
piece = (chunk.iloc[sliver:])
print(piece)
stack.append(piece)
newStacks.append(chunk)
return newStacks
this obviously is not efficient and aside doesn't produce the desire structure. The aim is to get a structure like below considering period = 3
0 -0.483128 -1.240024 -1.214765 -1.002418 -0.692334 -1.632132 1.562630 0.997304 -1.189383 -1.632132 1.562630 0.997304
1 -0.692334 -1.632132 1.562630 0.997304 -1.189383 -1.632132 1.562630 0.997304 -1.691841 -1.632132 1.562630 0.997304
2 -1.189383 -1.632132 1.562630 0.997304 -1.691841 -1.632132 1.562630 0.997304 -2.084926 -1.632132 1.562630 0.997304
AA Simple way to accomplish this would be appreciated.
Upvotes: 0
Views: 75
Reputation: 862691
If there are all floats columns for improve performance use strides
in numpy with reshape 3d array
to 2d
and pass to DataFrame
constructor:
#https://stackoverflow.com/a/44306231/2901002 a bit changed
def strided_lastaxis(a, L):
s0,s1 = a.strides
m,n = a.shape
return np.lib.stride_tricks.as_strided(a, shape=(m-L+1,L,n), strides=(s0,s0,s1))
a = strided_lastaxis(df.to_numpy(), 3)
df1 = pd.DataFrame(a.reshape(a.shape[0], -1))
print (df1)
0 1 2 3 4 5 6 \
0 -0.483128 -1.240024 -1.214765 -1.002418 -0.692334 -1.632132 1.56263
1 -0.692334 -1.632132 1.562630 0.997304 -1.189383 -1.632132 1.56263
2 -1.189383 -1.632132 1.562630 0.997304 -1.691841 -1.632132 1.56263
7 8 9 10 11
0 0.997304 -1.189383 -1.632132 1.56263 0.997304
1 0.997304 -1.691841 -1.632132 1.56263 0.997304
2 0.997304 -2.084926 -1.632132 1.56263 0.997304
Upvotes: 2
Reputation: 8508
I am not sure if you want to create a dataframe or a list. Here's the code to get you both.
import pandas as pd
c = ['D','K','Factor','p']
d = [[-0.483128, -1.240024, -1.214765, -1.002418],
[-0.692334, -1.632132, 1.562630, 0.997304],
[-1.189383, -1.632132, 1.562630, 0.997304],
[-1.691841, -1.632132, 1.562630, 0.997304],
[-2.084926, -1.632132, 1.562630, 0.997304]]
df = pd.DataFrame(d,columns=c)
print (df)
p = 3 #this is the period you wanted. I set it to 3
stack_list = [] #this will store the final stacked list
#note: don't use stack, its used by pandas to stack
for i in range(len(df)-p+1): #iterating thru the dataframe
# convert p rows to a list after you stack them
chunk = df.loc[i:i+p-1].stack().reset_index(level=1,drop=True).tolist()
stack_list.append(chunk) #store chunk to stack_list
df1 = pd.DataFrame(stack_list) #creating a dataframe as per your request
#printing both stack_list and dataframe
print (stack_list)
print (df1)
The output of this will be:
The original dataframe is:
D K Factor p
0 -0.483128 -1.240024 -1.214765 -1.002418
1 -0.692334 -1.632132 1.562630 0.997304
2 -1.189383 -1.632132 1.562630 0.997304
3 -1.691841 -1.632132 1.562630 0.997304
4 -2.084926 -1.632132 1.562630 0.997304
The stacked list is:
[[-0.483128, -1.240024, -1.214765, -1.002418, -0.692334, -1.632132, 1.56263, 0.997304, -1.189383, -1.632132, 1.56263, 0.997304],
[-0.692334, -1.632132, 1.56263, 0.997304, -1.189383, -1.632132, 1.56263, 0.997304, -1.691841, -1.632132, 1.56263, 0.997304],
[-1.189383, -1.632132, 1.56263, 0.997304, -1.691841, -1.632132, 1.56263, 0.997304, -2.084926, -1.632132, 1.56263, 0.997304]]
The new dataframe that you wanted to get created is:
0 1 2 ... 9 10 11
0 -0.483128 -1.240024 -1.214765 ... -1.632132 1.56263 0.997304
1 -0.692334 -1.632132 1.562630 ... -1.632132 1.56263 0.997304
2 -1.189383 -1.632132 1.562630 ... -1.632132 1.56263 0.997304
Upvotes: 2