Reputation: 43
First post to stackoverflow. I have searched an cannot find an answer to this.
I have a Pandas Series of 2D numpy arrays:
import numpy as np
import pandas as pd
x1 = np.array([[0,1],[2,3],[3,4]],dtype=np.uint8)
x2 = np.array([[5,6],[7,8],[9,10]],dtype=np.uint8)
S = pd.Series(data=[x1,x2],index=['a','b'])
The output S should look like:
a [[0, 1], [2, 3], [3, 4]]
b [[5, 6], [7, 8], [9, 10]]
I wish to have it transformed into a Pandas DataFrame D where each column of the 2D numpy array in S becomes a 1D numpy array in a column of D:
D should look like:
0 1
a [0,2,3] [1,3,4]
b [5,7,9] [6,8,10]
Note, my actual data set is 1238500 arrays sized (32,8) so i was trying to avoid iterating over rows.
What is an efficient way to do this?
Upvotes: 4
Views: 1227
Reputation: 4233
One solution with np.stack
and map
df = pd.DataFrame(np.stack(map(np.transpose, S)).tolist(), index=S.index)
print (df)
0 1
a [0, 2, 3] [1, 3, 4]
b [5, 7, 9] [6, 8, 10]
Upvotes: 3
Reputation: 11602
You can split and squeeze without ever converting the last dimension to a python list.
df = S.apply(np.split, args=[2, 1]).apply(pd.Series).applymap(np.squeeze)
# 0 1
# a [0, 2, 3] [1, 3, 4]
# b [5, 7, 9] [6, 8, 10]
In args=[2, 1]
, 2
stands for the number of columns and 1
stands for the axis to slice across.
Types:
In [280]: df.applymap(type)
Out[280]:
0 1
a <class 'numpy.ndarray'> <class 'numpy.ndarray'>
b <class 'numpy.ndarray'> <class 'numpy.ndarray'>
Upvotes: 1
Reputation: 21709
I would do like this:
# flatten the list
S = S.apply(lambda x: [i for s in x for i in s])
# pick alternate values and create a data frame
S = S.apply(lambda x: [x[::2], x[1::2]]).reset_index()[0].apply(pd.Series)
# name index
S.index = ['a','b']
0 1
a [0, 2, 3] [1, 3, 4]
b [5, 7, 9] [6, 8, 10]
Upvotes: 0