Building a numpy array (matrix) from several dataframes

Question

I have several dataframes which have the same look but different data.

DataFrame 1

                          bid
                        close
time                         
2016-05-24 00:00:00       NaN
2016-05-24 00:05:00  0.000611
2016-05-24 00:10:00 -0.000244
2016-05-24 00:15:00 -0.000122

DataFrame 2

                          bid
                        close
time                         
2016-05-24 00:00:00       NaN
2016-05-24 00:05:00  0.000811
2016-05-24 00:10:00 -0.000744
2016-05-24 00:15:00 -0.000322

I need to build a list of the dataframes, then pass that list of dataframes to a function that can take a list of dataframes and converts it to a numpy array. So below, each entry in the matrix is the elements of the dataframe ('bid close') column. Notice I don't need the index 'time' column

data = np.array([dataFrames])

returns this (example not actual data)

[[-0.00114415  0.02502565  0.00507831 ...,  0.00653057  0.02183072
  -0.00194293] `DataFrame` 1 is here ignore that the data doesn't match above
 [-0.01527224  0.02899528 -0.00327654 ...,  0.0322364   0.01821731
  -0.00766773] `DataFrame` 2 is here ignore that the data doesn't match above
 ....]]

hilberts_drinking_problem · Accepted Answer

Try

master_matrix = pd.concat(list_of_dfs, axis=1)
master_matrix = master_matrix.values.reshape(master_matrix.shape, order='F')

if each row in the final matrix corresponds to the same date

master_matrix = pd.concat(list_of_dfs, axis=1).values

otherwise.

Edit to address the newly added example. In this case, you can use np.vstack on columns returned from each dataframe.

import pandas as pd
import numpy as np
from io import StringIO

df1 = pd.read_csv(StringIO(
'''
time                bid_close
2016-05-24 00:00:00       NaN
2016-05-24 00:05:00  0.000611
2016-05-24 00:10:00 -0.000244
2016-05-24 00:15:00 -0.000122
'''), sep=r' +')

df2 = pd.read_csv(StringIO(
'''
time                bid_close
2016-05-24 00:00:00       NaN
2016-05-24 00:05:00  0.000811
2016-05-24 00:10:00 -0.000744
2016-05-24 00:15:00 -0.000322
'''), sep=r' +')

dfs = [df1, df2]

out = np.vstack(df.iloc[:,-1].values for df in dfs)

Result:

In [10]: q.out
Out[10]:
array([[      nan,  0.000611, -0.000244, -0.000122],
       [      nan,  0.000811, -0.000744, -0.000322]])

Building a numpy array (matrix) from several dataframes

Answers (2)

Setup

Solution

Note

Related Questions