Magellan88
Magellan88

Reputation: 2573

How to create an index on a ndarray using pandas

I'm really intrigued by the indexing of axis provided by pandas. I've worked with numpy lately and have an array, that keeps the position (XYZ) for a number of Particles (1 ... N) for a number of times (0.0 ... T). So that would be a three dimensional (T,N,3) array.

D = random((10,20,3))

now I'd like to add the pandas indexing to the appropriate axis to make it easier to access certain time frames, or certain selection of atoms. Let's say I'd like to attach the following index labels to the data:

T_index = arange( 10, dtype='f' )
N_index = arange( 20 )
P_index = ["x","y","z"]

I've looked around but have not found a good way of adding those to a pandas dataframe in a convinient way. I'm also not quite sure if the pandas dataframe is really the data structure I should be using, because maybe it brakes up the originally nicely formed numpy ndarray into something where the convenient numpy functions like mean() or sum() would be much slower.

Upvotes: 1

Views: 107

Answers (1)

unutbu
unutbu

Reputation: 880547

Since you have 3 axes, defining a Panel might be most convenient:

pan = pd.Panel(D, items=T_index, major_axis=N_index, minor_axis=P_index)
# <class 'pandas.core.panel.Panel'>
# Dimensions: 10 (items) x 20 (major_axis) x 3 (minor_axis)
# Items axis: 0.0 to 9.0
# Major_axis axis: 0 to 19
# Minor_axis axis: x to z

Then, if you wish to convert that to a DataFrame, use:

df = pan.to_frame()

The underlying data in pan is still in one numpy array of shape (10, 20, 3):

In [50]: pan._data
BlockManager
...
FloatBlock: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 10 x 20 x 3, dtype: float64

So I wouldn't expect there to be any significant deterioration in speed. And you could always drop back to numpy operations on the numpy array pan.values if need be, though, hopefully, that would be unnecessary.

Upvotes: 2

Related Questions