Ben
Ben

Reputation: 329

construct stacked dataFrame from 2D numpy array

I have a numpy array with a shape (47365L, 14L), I would like to convert it into make 2 column dataFrame where the first column is the id, which is the number of the column in my original array and the second one is the data which is the content of the column like in the image below.

enter image description here

I tried this code buut without success.

    S.shape
    >>(47365L, 14L)
    IC_Label = ['sub_'+str(subNum) +'_IC_'+str(i) for i in range(0,S.shape[1])]
    ICA_df=pd.DataFrame(data=S, columns= IC_Label)

output:

        sub_1_IC_0  sub_1_IC_1  sub_1_IC_2  sub_1_IC_3  sub_1_IC_4 ............sub_1_IC_13
    0   -0.002277   -0.003315   -0.001300   0.000283    0.000473
    1   -0.004442   0.002921    0.000517    -0.000033   0.000349
    2   -0.003370   0.006067    0.002504    -0.000359   0.001467
    3   0.000075    0.004708    0.000087    -0.000176   0.002561
    .
    .
    47364

My data looks like this:

    ('MyData', array([[ -2.27721244e-03,  -3.31485020e-03,  -1.30029316e-03, ...,
     -1.33952356e-03,   2.93513841e-03,   1.22486268e-03],
   [ -4.44161427e-03,   2.92134270e-03,   5.17394574e-04, ...,
     -1.42587472e-03,   4.74996003e-03,   2.86699268e-03],
   [ -3.36958390e-03,   6.06671900e-03,   2.50417650e-03, ...,
      3.80042801e-03,   4.77244983e-03,   2.82142774e-03],
   ..., 
   [  3.57952033e-04,  -3.55516707e-04,   1.32481754e-03, ...,
      7.55565993e-03,   1.52018736e-02,  -4.67047470e-03],
   [ -2.07206006e-03,   6.60686763e-04,   2.04637699e-03, ...,
      7.94521155e-03,   1.50024315e-02,  -4.88461803e-03],
   [  1.43106607e-03,  -1.77342544e-03,   4.57835186e-05, ...,
      5.55311541e-03,   1.61148688e-02,  -4.63583610e-03]]))

Is there any way to do it?

Thank you

Upvotes: 1

Views: 49

Answers (1)

jezrael
jezrael

Reputation: 862681

I think you need select second values of tuple by [1], DataFrame constructor and last add_prefix:

df = pd.DataFrame(a[1]).add_prefix('sub_1_IC_')
print (df)
   sub_1_IC_0  sub_1_IC_1  sub_1_IC_2  sub_1_IC_3  sub_1_IC_4  sub_1_IC_5
0   -0.002277   -0.003315   -0.001300   -0.001340    0.002935    0.001225
1   -0.004442    0.002921    0.000517   -0.001426    0.004750    0.002867
2   -0.003370    0.006067    0.002504    0.003800    0.004772    0.002821
3    0.000358   -0.000356    0.001325    0.007556    0.015202   -0.004670
4   -0.002072    0.000661    0.002046    0.007945    0.015002   -0.004885
5    0.001431   -0.001773    0.000046    0.005553    0.016115   -0.004636

And then use melt:

df = pd.DataFrame(a[1]).add_prefix('CL_').melt(var_name='id', value_name='data')
print (df.head(10))
     id      data
0  CL_0 -0.002277
1  CL_0 -0.004442
2  CL_0 -0.003370
3  CL_0  0.000358
4  CL_0 -0.002072
5  CL_0  0.001431
6  CL_1 -0.003315
7  CL_1  0.002921
8  CL_1  0.006067
9  CL_1 -0.000356

Another alternative with numpy.ravel for flattening with numpy.repeat:

df = pd.DataFrame({'data':a[1].T.ravel(), 
                   'id': np.repeat(np.arange(6), 6)}, columns=['id','data'])
df['id'] = 'CL' + df['id'].astype(str)
print (df.head(10))
     id      data
0  CL_0 -0.002277
1  CL_0 -0.004442
2  CL_0 -0.003370
3  CL_0  0.000358
4  CL_0 -0.002072
5  CL_0  0.001431
6  CL_1 -0.003315
7  CL_1  0.002921
8  CL_1  0.006067
9  CL_1 -0.000356

Upvotes: 1

Related Questions