Reputation: 1580
I have a Pandas Series that consists of arrays of pairs:
In [177]: pair_arrays
Out[177]:
15192 [[1, 9], [2, 14], [4, 1], [5, 36], [6, 8], [7,...
16012 [[0, 107], [1, 42], [2, 22], [3, 59], [4, 117]...
17523 [[0, 44], [1, 36], [2, 43], [3, 28], [4, 52], ...
...
I would like to reshape that into a dataframe with two columns, 'x' and 'y', which has a shape similar to:
In [179]: pd.DataFrame([{'x':1, 'y':42}, {'x':4, 'y':12}], columns=['x', 'y'])
Out[179]:
x y
0 1 42
1 4 12
...
How do I do this?
Upvotes: 2
Views: 2646
Reputation: 10298
Assuming that each element in the series is an array of pairs, and each pair is a sequence, this should work:
pair_df = pd.DataFrame(np.vstack(pair_arrays.values), columns=['x','y'])
The key point is that pandas doesn't know how to work with object arrays. So what I am doing here is converting it to a numpy array of object arrays. Then I am stacking the object arrays, which gets you a 2D integer array, and then converting it back to a DataFrame.
Technically you don't currently need to use the values
method to explicitly convert to a numpy array, but I think that is clearer and potentially safer long-term.
Upvotes: 3
Reputation: 1580
I can go via Python as follows:
pd.DataFrame(
[item for sublist in pair_arrays.tolist() for item in sublist],
columns=['x', 'y']
)
This works for my use case, but maybe not ideal to go via Python like that.
Upvotes: 0