Reputation: 329
I have a numpy array with a shape (47365L, 14L), I would like to convert it into make 2 column dataFrame where the first column is the id, which is the number of the column in my original array and the second one is the data which is the content of the column like in the image below.
I tried this code buut without success.
S.shape
>>(47365L, 14L)
IC_Label = ['sub_'+str(subNum) +'_IC_'+str(i) for i in range(0,S.shape[1])]
ICA_df=pd.DataFrame(data=S, columns= IC_Label)
output:
sub_1_IC_0 sub_1_IC_1 sub_1_IC_2 sub_1_IC_3 sub_1_IC_4 ............sub_1_IC_13
0 -0.002277 -0.003315 -0.001300 0.000283 0.000473
1 -0.004442 0.002921 0.000517 -0.000033 0.000349
2 -0.003370 0.006067 0.002504 -0.000359 0.001467
3 0.000075 0.004708 0.000087 -0.000176 0.002561
.
.
47364
My data looks like this:
('MyData', array([[ -2.27721244e-03, -3.31485020e-03, -1.30029316e-03, ...,
-1.33952356e-03, 2.93513841e-03, 1.22486268e-03],
[ -4.44161427e-03, 2.92134270e-03, 5.17394574e-04, ...,
-1.42587472e-03, 4.74996003e-03, 2.86699268e-03],
[ -3.36958390e-03, 6.06671900e-03, 2.50417650e-03, ...,
3.80042801e-03, 4.77244983e-03, 2.82142774e-03],
...,
[ 3.57952033e-04, -3.55516707e-04, 1.32481754e-03, ...,
7.55565993e-03, 1.52018736e-02, -4.67047470e-03],
[ -2.07206006e-03, 6.60686763e-04, 2.04637699e-03, ...,
7.94521155e-03, 1.50024315e-02, -4.88461803e-03],
[ 1.43106607e-03, -1.77342544e-03, 4.57835186e-05, ...,
5.55311541e-03, 1.61148688e-02, -4.63583610e-03]]))
Is there any way to do it?
Thank you
Upvotes: 1
Views: 49
Reputation: 862681
I think you need select second values of tuple by [1]
, DataFrame
constructor and last add_prefix
:
df = pd.DataFrame(a[1]).add_prefix('sub_1_IC_')
print (df)
sub_1_IC_0 sub_1_IC_1 sub_1_IC_2 sub_1_IC_3 sub_1_IC_4 sub_1_IC_5
0 -0.002277 -0.003315 -0.001300 -0.001340 0.002935 0.001225
1 -0.004442 0.002921 0.000517 -0.001426 0.004750 0.002867
2 -0.003370 0.006067 0.002504 0.003800 0.004772 0.002821
3 0.000358 -0.000356 0.001325 0.007556 0.015202 -0.004670
4 -0.002072 0.000661 0.002046 0.007945 0.015002 -0.004885
5 0.001431 -0.001773 0.000046 0.005553 0.016115 -0.004636
And then use melt
:
df = pd.DataFrame(a[1]).add_prefix('CL_').melt(var_name='id', value_name='data')
print (df.head(10))
id data
0 CL_0 -0.002277
1 CL_0 -0.004442
2 CL_0 -0.003370
3 CL_0 0.000358
4 CL_0 -0.002072
5 CL_0 0.001431
6 CL_1 -0.003315
7 CL_1 0.002921
8 CL_1 0.006067
9 CL_1 -0.000356
Another alternative with numpy.ravel
for flattening with numpy.repeat
:
df = pd.DataFrame({'data':a[1].T.ravel(),
'id': np.repeat(np.arange(6), 6)}, columns=['id','data'])
df['id'] = 'CL' + df['id'].astype(str)
print (df.head(10))
id data
0 CL_0 -0.002277
1 CL_0 -0.004442
2 CL_0 -0.003370
3 CL_0 0.000358
4 CL_0 -0.002072
5 CL_0 0.001431
6 CL_1 -0.003315
7 CL_1 0.002921
8 CL_1 0.006067
9 CL_1 -0.000356
Upvotes: 1