Nili
Nili

Reputation: 333

Splitting Pandas dataframe

I'd like to split my time-series data into X and y by shifting the data. The dummy dataframe looks like:

enter image description here

i.e. if the time steps equal to 2, X and y look like: X=[3,0] -> y= [5]

X=[0,5] -> y= [7] (this should be applied to the entire samples (rows))

I wrote the function below, but it returns empty matrices when I pass pandas dataframe to the function.

def create_dataset(dataset, time_step=1):
dataX, dataY = [], []
for i in range (len(dataset)-time_step-1):
    a = dataset.iloc[:,i:(i+time_step)]
    dataX.append(a)
    dataY.append(dataset.iloc[:, i + time_step ])
return np.array(dataX), np.array(dataY)

Thank you for any solutions.

Upvotes: 0

Views: 86

Answers (2)

jsmart
jsmart

Reputation: 3001

Here is an example that replicates the example, IIUC:

import pandas as pd

# function to process each row
def process_row(s):
    assert isinstance(s, pd.Series)
    return pd.concat([
        s.rename('timestep'),
        s.shift(-1).rename('x_1'),
        s.shift(-2).rename('x_2'),
        s.shift(-3).rename('y')
    ], axis=1).dropna(how='any', axis=0).astype(int)

# test case for the example
process_row( pd.Series([2, 3, 0, 5, 6]) )

# type in first two rows of the data frame
df = pd.DataFrame(
    {'x-2': [3, 2], 'x-1': [0, 3], 
     'x0': [5, 0], 'x1': [7, 5], 'x2': [1, 6]})

# perform the transformation
ts = list()

for idx, row in df.iterrows():
    t = process_row(row)
    t.index = [idx] * t.index.size
    ts.append(t)
    
print(pd.concat(ts))

# results
   timestep  x_1  x_2  y
0         3    0    5  7
0         0    5    7  1
1         2    3    0  5   <-- first part of expected results
1         3    0    5  6   <-- second part

Upvotes: 1

GhandiFloss
GhandiFloss

Reputation: 384

Do you mean something like this:

df = df.shift(periods=-2, axis='columns')

# you can also pass a fill values parameter
df = df.shift(periods=-2, axis='columns', fill_value = 0)

Upvotes: 0

Related Questions