Spon
Spon

Reputation: 47

4D array numpy into pandas

I'd like to transform my numpy array ( shape=(27, 77, 77) ) :

   [[1., 1., 1., ..., 1., 1., 1.],
    [1., 1., 1., ..., 1., 1., 1.],
    [1., 1., 1., ..., 1., 1., 1.],
    ...,
    [1., 1., 1., ..., 2., 2., 2.],
    [1., 1., 1., ..., 2., 2., 2.],
    [1., 1., 1., ..., 1., 2., 2.]],

   ...,

   [[1., 1., 1., ..., 1., 1., 0.],
    [1., 1., 1., ..., 1., 1., 0.],
    [1., 1., 1., ..., 1., 1., 0.],
    ...,
    [1., 1., 1., ..., 1., 1., 1.],
    [1., 1., 1., ..., 1., 1., 1.],
    [1., 1., 1., ..., 1., 1., 1.]])

into a pandas dataframe with columns 'x' = index 2 (right), 'y' = index 1 (down), 'z' = index 0 ( the 27 "different" arrays) and 'v' = the values in it. df.columns=['x','y','z','v']

I'm relatively new to python, do you know how I should code this?

Thanks !

Upvotes: 1

Views: 652

Answers (2)

Stefan B
Stefan B

Reputation: 1677

As a "simple" one-liner for arbitrary number of dimensions:

>>> import itertools as it; import numpy as np; import pandas as pd

# analogous test data
>>> arr = np.random.rand(27, 77, 77)

# np.nditer(arr) + v.item() using no additional memory
# arr.flatten() is slightly faster but uses additional memory
>>> df = pd.DataFrame(data=[(*axes, v.item()) for axes, v in zip(it.product(*[range(i) for i in arr.shape]), np.nditer(arr))], columns=tuple('xyzv'))
>>> df
         x   y   z         v
0        0   0   0  0.375027
1        0   0   1  0.511405
2        0   0   2  0.645937
3        0   0   3  0.229538
4        0   0   4  0.274867
...     ..  ..  ..       ...
160078  26  76  72  0.404251
160079  26  76  73  0.010852
160080  26  76  74  0.048079
160081  26  76  75  0.426528
160082  26  76  76  0.723565

Upvotes: 2

Tim Roberts
Tim Roberts

Reputation: 54698

This does it in a primitive way.

import numpy as np
import pandas as pd

data = np.ones( (27,77,77) )

rows = []
for i,plane in enumerate(data):
    for j,row in enumerate(plane):
        for k,col in enumerate(row):
            rows.append( [k,j,i,col] )

df = pd.DataFrame( rows, columns=['x','y','z','val'])
print(df)

Output:

C:\tmp>python x.py
         x   y   z  val
0        0   0   0  1.0
1        1   0   0  1.0
2        2   0   0  1.0
3        3   0   0  1.0
4        4   0   0  1.0    
...     ..  ..  ..  ...
160078  72  76  26  1.0
160079  73  76  26  1.0
160080  74  76  26  1.0
160081  75  76  26  1.0
160082  76  76  26  1.0

[160083 rows x 4 columns]

C:\tmp>

Upvotes: 2

Related Questions