Reputation: 26067
I'm not sure how to figure this out as I'm not familiar with numpy compared to pandas. I have a nested array and I would like to extract a specific column. For example. Give this dataframe:
MPG Cylinders Displacement Horsepower Weight Acceleration Model Year Origin NumpyColumn
0 18.0 8 307.0 130.0 3504.0 12.0 70 1 [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1 15.0 8 350.0 165.0 3693.0 11.5 70 1 [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
2 18.0 8 318.0 150.0 3436.0 11.0 70 1 [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
3 16.0 8 304.0 150.0 3433.0 12.0 70 1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
4 17.0 8 302.0 140.0 3449.0 10.5 70 1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
... ... ... ... ... ... ... ... ... ...
393 27.0 4 140.0 86.0 2790.0 15.6 82 1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
394 44.0 4 97.0 52.0 2130.0 24.6 82 2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
395 32.0 4 135.0 84.0 2295.0 11.6 82 1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
396 28.0 4 120.0 79.0 2625.0 18.6 82 1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
397 31.0 4 119.0 82.0 2720.0 19.4 82 1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
if I want to separate the nested array from the other items I would do this:
everthingExceptNumpyArray = df.drop('NumpyColumn',1).to_numpy()
onlyNumpyArray = np.array(df['NumpyColumn'].tolist())
but I'm not sure how to do this if the above df is already a numpy array. So given:
array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 9921.0,
20.0, 0.40457918757980704, 0.11369258150627903, 0.868421052631579,
0.47368421052631576, 0.894736842105263, 0.06688034531010473,
0.16160188713280013, 0.7368421052631579, 0.1673332894736842,
0.2099143206854345, 0.3690644464300929, 0.07097828135799109,
0.8157894736842104, 0.9210526315789473, 0.23091420289239645,
0.08623506024464939, 0.5789473684210527, 0.763157894736842, 0.0,
0.18421052631578946, 0.07949239000059796, 0.18763907099960708,
0.7368421052631579, 0.2668740256483197, 0.6842105263157894,
0.13699219747488295, 0.868421052631579, 0.868421052631579,
0.052631349139178094, 0.6842105263157894, 0.5526315789473684,
0.6842105263157894, 0.6842105263157894, 0.6842105263157894,
0.7105263157894737, 0.7105263157894737, 0.7105263157894737,
0.23684210526315788, 0.0, 0.7105263157894737, 0.5789473684210527,
0.763157894736842, 0.5263157894736842, 0.6578947368421052,
0.6842105263157894, 0.7105263157894737, 0.0, 0.5789473684210527,
0.2631578947368421, 0.6842105263157894, 0.6578947368421052,
0.42105263157894735, 0.5789473684210527, 0.42105263157894735,
0.7368421052631579, 0.7368421052631579, 0.15207999030227856,
0.8445892232119124, 0.2683721567016762, 0.3142850329243405,
0.18421052631578946, 0.19132292433056333, 0.20615136344079915,
0.14475710664724623, 0.1624920232728424, 0.6989826700898587,
0.18421052631578946, 0.21052631578947367, 0.4793448772543646,
0.7894736842105263, 0.682967263567459, 0.37139592674256894,
0.21123755190149363, 0.18421052631578946, 0.6578947368421052,
0.39473684210526316, 0.631578947368421, 0.7894736842105263,
0.36842105263157887, 0.1863353145721346, 0.7368421052631579,
0.26809396092240706, 0.22492185003691062, 0.1460488284639197,
0.631578947368421, 0.15347526114630458, 0.763157894736842,
0.2097323620058104, 0.3684210526315789, 0.631578947368421,
0.631578947368421, 0.631578947368421, 0.6842105263157894,
0.36842105263157887, 0.10507952765043811, 0.22418515695024185,
0.23755698619020282, 0.22226500126902, 0.530004040377794,
0.3421052631578947, 0.19018711711349692, 0.19629244102133708,
0.5789473684210527, 0.10526315789473684, 0.49999999999999994,
0.5263157894736842, 0.5263157894736842, 0.49999999999999994,
0.1052631578947368, 0.10526315789473678, 0.5263157894736842,
0.4736842105263157, 2013.0,
array([0. , 0. , 0. , 0.62235785, 0. ,
0.27049118, 0. , 0.31094068, 0. , 0. ,
0. , 0. , 0. , 0.4330532 , 0. ,
0. , 0.2515796 , 0. , 0. , 0. ,
0.40683705, 0.01569915, 0. , 0. , 0. ,
0.13090582, 0. , 0.49955425, 0.06970194, 0.29155406,
0. , 0. , 0.27342197, 0. , 0. ,
0. , 0.04415211, 0. , 0.03908829, 0. ,
0.07673171, 0.33199945, 0. , 0.51759815, 0. ,
0.4719149 , 0.4538082 , 0.13475986, 0. , 0. ,
0. , 0. , 0. , 0. , 0.08000553,
0. , 0.02991109, 0. , 0.5051543 , 0. ,
0.24663273, 0. , 0.50839704, 0. , 0. ,
0.05281948, 0.44884402, 0. , 0.44542992, 0.15376966,
0. , 0. , 0. , 0.39128256, 0.49497205,
0. , 0. ], dtype=float32)
What can I do to get a similar result as above but if the data is already a NumPy array?
Upvotes: 0
Views: 126
Reputation: 231738
Making a sample dataframe:
In [61]: data = pd.DataFrame([[1,2,np.array([3,4])],[5,6,np.array([7,8])]])
In [62]: data
Out[62]:
0 1 2
0 1 2 [3, 4]
1 5 6 [7, 8]
In [63]: data[2]
Out[63]:
0 [3, 4]
1 [7, 8]
Name: 2, dtype: object
The numpy
array extraction of the series:
In [65]: data[2].to_numpy()
Out[65]: array([array([3, 4]), array([7, 8])], dtype=object)
and list version:
In [66]: data[2].to_list()
Out[66]: [array([3, 4]), array([7, 8])]
If all those subarrays have the same shape we can combine them with stack
(or vstack
):
In [67]: np.stack(data[2].to_list())
Out[67]:
array([[3, 4],
[7, 8]])
and a more obnoxious mix of elements
In [71]: data = pd.DataFrame([[1,2,np.array([3,4])],[5,6,[7,8]],[10,11,"[12, 13]
...: "],[12,13,np.nan]])
In [72]: data
Out[72]:
0 1 2
0 1 2 [3, 4]
1 5 6 [7, 8]
2 10 11 [12, 13]
3 12 13 NaN
In [73]: data[2].to_list()
Out[73]: [array([3, 4]), [7, 8], '[12, 13]', nan]
Note that the dataframe display doesn't give much indication of the different types.
Upvotes: 1