Reputation: 17
For example I have these numpy arrays:
import pandas as pd
import numpy as np
# points could be in n dimension, i need a solution that would cover that up
# and being able to calculate distance between points so flattening the data
# is not my goal.
points = np.array([[1, 2], [2, 1], [100, 100], [-2, -1], [0, 0], [-1, -2]]) # a 2d numpy array containing points in space
labels = np.array([0, 1, 1, 1, 0, 0]) # the labels of the points (not necessarily only 0 and 1)
I tried to make a dictionary and from that to create the pandas datafram:
my_dict = {'point': points, 'label': labels}
df = pd.DataFrame(my_dict, columns=['point', 'label'])
But it didn't work and I got the following exception:
Exception: Data must be 1-dimensional
Probably it's because of the numpy array of points (a 2d numpy array).
The desired result:
point label
0 [1, 2] 0
1 [2, 1] 1
2 [100, 100] 1
3 [-2, -1] 0
4 [0, 0] 0
5 [-1, -2] 1
Thanks in advance for all the helpers :)
Upvotes: 0
Views: 1141
Reputation: 117771
You should always try to normalize your data such that each column only contains singular values, not data with a dimension.
In this case, I would do something like this:
>>> df = pd.DataFrame({'x': points[:,0], 'y': points[:, 1], 'label': labels},
columns=['x', 'y', 'label'])
>>> df
x y label
0 1 2 0
1 2 1 1
2 100 100 1
3 -2 -1 1
4 0 0 0
5 -1 -2 0
If you truly insist with keeping points as such, transform them to a list of lists or list of tuples before passing to pandas
to avoid this error.
Upvotes: 0