Reputation: 433
I have a three column dataset formatted as in the following
t_stamp,Xval,Ytval
0.000543,0,10
0.000575,0,10
0.041324,1,10
0.041331,2,10
0.041336,3,10
0.04134,4,10
0.041345,5,10
0.04135,6,10
0.041354,7,10
df = pd.read_csv('test.csv')
I wanted to run only for the first 3 rows, I can do it like the following
for i in range(1,4):
df['X_t'+str(i)] = df['X'].shift(i)
X = df[['X', 'X_t1', 'X_t2', 'X_t3']].values
and if I want to do it for 5 ranges, I can do it like the following
for i in range(1,6):
df['X_t'+str(i)] = df['X'].shift(i)
X = df[['X', 'X_t1', 'X_t2', 'X_t3', 'X_t4', 'X_t5']].values
However, this is not efficient if we want to run it for higher ranges. How can I automate so that we don't have to manually assign the values of X as X = df[['X', 'X_t1', 'X_t2', 'X_t3']].values
let's say if we want to run for 1000 ranges? Doing X = df[['X', 'X_t1', 'X_t2', 'X_t3', 'X_t4', ..., 'X_1000']].values
is not efficient.
Upvotes: 1
Views: 468
Reputation: 402854
Use a dict comprehension, build a dict of columns, create a temporary data frame and extract its values:
In [1679]: pd.DataFrame({ 'X_%d'%i : df['Xval'].shift(i) for i in range(5)}).values
Out[1679]:
array([[ 0., nan, nan, nan, nan],
[ 0., 0., nan, nan, nan],
[ 1., 0., 0., nan, nan],
[ 2., 1., 0., 0., nan],
[ 3., 2., 1., 0., 0.],
[ 4., 3., 2., 1., 0.],
[ 5., 4., 3., 2., 1.],
[ 6., 5., 4., 3., 2.],
[ 7., 6., 5., 4., 3.]])
Similarly,
In [1680]: pd.DataFrame({ 'X_%d'%i : df['Xval'].shift(i) for i in range(10)}).values
Out[1680]:
array([[ 0., nan, nan, nan, nan, nan, nan, nan, nan, nan],
[ 0., 0., nan, nan, nan, nan, nan, nan, nan, nan],
[ 1., 0., 0., nan, nan, nan, nan, nan, nan, nan],
[ 2., 1., 0., 0., nan, nan, nan, nan, nan, nan],
[ 3., 2., 1., 0., 0., nan, nan, nan, nan, nan],
[ 4., 3., 2., 1., 0., 0., nan, nan, nan, nan],
[ 5., 4., 3., 2., 1., 0., 0., nan, nan, nan],
[ 6., 5., 4., 3., 2., 1., 0., 0., nan, nan],
[ 7., 6., 5., 4., 3., 2., 1., 0., 0., nan]])
range
starts from 0
, because df.shift(0)
is just X
, which you want as well.
If you want to replace those nan
values, you can use df.fillna(0)
first, and then extract its values:
pd.DataFrame({ 'X_%d'%i : df['Xval'].shift(i) for i in range(5)}).fillna(0).values
Alternatively, apply np.nan_to_num
:
pd.DataFrame({ 'X_%d'%i : df['Xval'].shift(i) for i in range(5)}).apply(np.nan_to_num, axis=0).values
Alternatively, use dropna
, if you are okay with your dataframe being truncated:
In [1763]: pd.DataFrame({ 'X_%d'%i : df['Xval'].shift(i) for i in range(5)}).dropna().values
Out[1763]:
array([[ 3., 2., 1., 0., 0.],
[ 4., 3., 2., 1., 0.],
[ 5., 4., 3., 2., 1.],
[ 6., 5., 4., 3., 2.],
[ 7., 6., 5., 4., 3.]])
Compare this to the first example, all rows with nan
are lopped off.
Upvotes: 5