aph
aph

Reputation: 1855

Pandas DataFrame with columns for arrays of different dimension

I have a set of data associated with Npts points. Some of that data are scalar values, such as color, some of the data are multi-dimensional, such as 3d position. I am trying to bundle this data into a pandas data structure, and get a variety of error messages depending on how I try to do it.

Here's some mock data:

Npts=100
pos = np.random.uniform(0, 250, Npts*3).reshape(Npts, 3)
colors = np.random.uniform(-1, 1, Npts)

Using a dictionary as input, the color data alone bundles up into a Data Frame just fine:

df_colors = pandas.DataFrame({'colors':colors})

But the position information does not:

df_pos = pandas.DataFrame({'pos':pos})

This returns the following unhelpful error message:

ValueError: If using all scalar values, you must must pass an index

And what I really want to do is bundle both position and color information together:

df_agg = pandas.DataFrame({'pos':pos, 'colors':colors})

But this does not work, and returns the following equally cryptic error:

Exception: Data must be 1-dimensional

Surely it is possible to bundle multi-dimensional data with pandas, as well as data with mixed dimension. Does anyone know the API for this behavior?

Upvotes: 1

Views: 3849

Answers (1)

andrew
andrew

Reputation: 4089

The problem is that pos has dimensions of (100,3). To turn it into a column, you need an array of dimensions (100,).

One option is to create an individual column for each of the dimensions:

df_agg = pandas.DataFrame({'posX':pos[:,0], 'posY':pos[:,1], 'posZ':pos[:,2], 'colors':colors})

Another options is to cast each coordinate into a 3-tuple:

posTuple = tuple(map(tuple,pos))
df_aggV2 = pandas.DataFrame({'pos':posTuple, 'colors':colors})

Upvotes: 1

Related Questions