Reputation: 7746
I have several dataframes which have the same look but different data.
DataFrame 1
bid
close
time
2016-05-24 00:00:00 NaN
2016-05-24 00:05:00 0.000611
2016-05-24 00:10:00 -0.000244
2016-05-24 00:15:00 -0.000122
DataFrame 2
bid
close
time
2016-05-24 00:00:00 NaN
2016-05-24 00:05:00 0.000811
2016-05-24 00:10:00 -0.000744
2016-05-24 00:15:00 -0.000322
I need to build a list of the dataframes, then pass that list of dataframes to a function that can take a list of dataframes and converts it to a numpy array. So below, each entry in the matrix is the elements of the dataframe ('bid close') column. Notice I don't need the index 'time' column
data = np.array([dataFrames])
returns this (example not actual data)
[[-0.00114415 0.02502565 0.00507831 ..., 0.00653057 0.02183072
-0.00194293] `DataFrame` 1 is here ignore that the data doesn't match above
[-0.01527224 0.02899528 -0.00327654 ..., 0.0322364 0.01821731
-0.00766773] `DataFrame` 2 is here ignore that the data doesn't match above
....]]
Upvotes: 2
Views: 118
Reputation: 294218
import pandas as pd
import numpy as np
df1 = pd.DataFrame([1, 2, 3, 4],
index=pd.date_range('2016-04-01', periods=4),
columns=pd.MultiIndex.from_tuples([('bid', 'close')]))
df2 = pd.DataFrame([5, 6, 7, 8],
index=pd.date_range('2016-03-01', periods=4),
columns=pd.MultiIndex.from_tuples([('bid', 'close')]))
print df1
bid
close
2016-04-01 1
2016-04-02 2
2016-04-03 3
2016-04-04 4
print df2
bid
close
2016-03-01 5
2016-03-02 6
2016-03-03 7
2016-03-04 8
df = np.concatenate([d.T.values for d in [df1, df2]])
print df
[[1 2 3 4]
[5 6 7 8]]
The indices were not required to line up. This just takes the raw np.array
from each dataframe and uses np.concatenate
to do the rest.
Upvotes: 1
Reputation: 11602
Try
master_matrix = pd.concat(list_of_dfs, axis=1)
master_matrix = master_matrix.values.reshape(master_matrix.shape, order='F')
if each row in the final matrix corresponds to the same date
master_matrix = pd.concat(list_of_dfs, axis=1).values
otherwise.
Edit to address the newly added example.
In this case, you can use np.vstack
on columns returned from each dataframe.
import pandas as pd
import numpy as np
from io import StringIO
df1 = pd.read_csv(StringIO(
'''
time bid_close
2016-05-24 00:00:00 NaN
2016-05-24 00:05:00 0.000611
2016-05-24 00:10:00 -0.000244
2016-05-24 00:15:00 -0.000122
'''), sep=r' +')
df2 = pd.read_csv(StringIO(
'''
time bid_close
2016-05-24 00:00:00 NaN
2016-05-24 00:05:00 0.000811
2016-05-24 00:10:00 -0.000744
2016-05-24 00:15:00 -0.000322
'''), sep=r' +')
dfs = [df1, df2]
out = np.vstack(df.iloc[:,-1].values for df in dfs)
Result:
In [10]: q.out
Out[10]:
array([[ nan, 0.000611, -0.000244, -0.000122],
[ nan, 0.000811, -0.000744, -0.000322]])
Upvotes: 1