Niv
Niv

Reputation: 17

Creating a pandas dataframe from a 2d numpy array (to be a column of 1d numpy arrays) and a 1d np array of labels

For example I have these numpy arrays:

import pandas as pd
import numpy as np

# points could be in n dimension, i need a solution that would cover that up
# and being able to calculate distance between points so flattening the data
# is not my goal.
points = np.array([[1, 2], [2, 1], [100, 100], [-2, -1], [0, 0], [-1, -2]])  # a 2d numpy array containing points in space
labels = np.array([0, 1, 1, 1, 0, 0])  # the labels of the points (not necessarily only 0 and 1)

I tried to make a dictionary and from that to create the pandas datafram:

my_dict = {'point': points, 'label': labels}
df = pd.DataFrame(my_dict, columns=['point', 'label'])

But it didn't work and I got the following exception:

Exception: Data must be 1-dimensional

Probably it's because of the numpy array of points (a 2d numpy array).

The desired result:

        point  label
0      [1, 2]      0
1      [2, 1]      1
2  [100, 100]      1
3    [-2, -1]      0
4      [0, 0]      0
5    [-1, -2]      1

Thanks in advance for all the helpers :)

Upvotes: 0

Views: 1141

Answers (1)

orlp
orlp

Reputation: 117771

You should always try to normalize your data such that each column only contains singular values, not data with a dimension.

In this case, I would do something like this:

>>> df = pd.DataFrame({'x': points[:,0], 'y': points[:, 1], 'label': labels},
                      columns=['x', 'y', 'label'])
>>> df
     x    y  label
0    1    2      0
1    2    1      1
2  100  100      1
3   -2   -1      1
4    0    0      0
5   -1   -2      0

If you truly insist with keeping points as such, transform them to a list of lists or list of tuples before passing to pandas to avoid this error.

Upvotes: 0

Related Questions