Reputation: 2573
I'm really intrigued by the indexing of axis provided by pandas. I've worked with numpy lately and have an array, that keeps the position (XYZ) for a number of Particles (1 ... N) for a number of times (0.0 ... T). So that would be a three dimensional (T,N,3) array.
D = random((10,20,3))
now I'd like to add the pandas indexing to the appropriate axis to make it easier to access certain time frames, or certain selection of atoms. Let's say I'd like to attach the following index labels to the data:
T_index = arange( 10, dtype='f' )
N_index = arange( 20 )
P_index = ["x","y","z"]
I've looked around but have not found a good way of adding those to a pandas dataframe in a convinient way. I'm also not quite sure if the pandas dataframe is really the data structure I should be using, because maybe it brakes up the originally nicely formed numpy ndarray into something where the convenient numpy functions like mean() or sum() would be much slower.
Upvotes: 1
Views: 107
Reputation: 880547
Since you have 3 axes, defining a Panel might be most convenient:
pan = pd.Panel(D, items=T_index, major_axis=N_index, minor_axis=P_index)
# <class 'pandas.core.panel.Panel'>
# Dimensions: 10 (items) x 20 (major_axis) x 3 (minor_axis)
# Items axis: 0.0 to 9.0
# Major_axis axis: 0 to 19
# Minor_axis axis: x to z
Then, if you wish to convert that to a DataFrame, use:
df = pan.to_frame()
The underlying data in pan
is still in one numpy array of shape (10, 20, 3):
In [50]: pan._data
BlockManager
...
FloatBlock: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 10 x 20 x 3, dtype: float64
So I wouldn't expect there to be any significant deterioration in speed. And you could always drop back to numpy operations on the numpy array pan.values
if need be, though, hopefully, that would be unnecessary.
Upvotes: 2