Romain Martinez
Romain Martinez

Reputation: 85

create a tidy pandas dataframe from a numpy 3d array

I have the following numpy 3d array: mat.data['Sylvain_2015'].shape = (180, 12, 15)

This array is populated with a variable (muscle activation) for each participant (first dimension: 180), for each muscle (second dimension: 12), for each condition (third dimension: 15).

I want to transform this array to the following pandas dataframe:

       muscle  participant    activation  test
0           1            1    100.000000     1
1           1            1     69.322225     2
2           1            1     84.917656     3
3           1            1     80.983069     4
4           1            1     65.163384     5
5           1            1     30.528706     6

Is there a more efficient way than with three for loops:

participants, muscles, tests, relative_mvc = ([] for i in range(4))

    for iparticipant in range(mat.data[idataset].shape[0]):
            for imuscle in range(mat.data[idataset].shape[1]):
                max_mvc = np.nanmax(mat.data[idataset][iparticipant, imuscle, :])
                for itest in range(mat.data[idataset].shape[2]):
                    participants.append(iparticipant+1)
                    datasets.append(idataset)
                    muscles.append(imuscle+1)
                    tests.append(itest+1)
                    # normalize mvc (relative to max)
                    activation.append(mat.data[idataset][iparticipant, imuscle, itest]*100/max_mvc)

df = pd.DataFrame({
    'participant': participants,
    'dataset': datasets,
    'muscle': muscles,
    'test': tests,
    'relative_mvc': relative_mvc,
}).dropna()

Here is a sample of the 3d array for two participants (created with this useful post)

# Array shape: (2, 12, 15)
0.13    0.09    0.11    0.11    0.09    0.04    0.03    0.06    0.11    0.09    0.03    0.10    0.01    0.03    0.08   
0.21    0.36    0.34    0.18    0.25    0.23    0.11    0.05    0.27    0.27    0.13    0.26    0.04    0.02    0.34   
0.16    0.09    0.41    0.28    0.20    0.10    0.16    0.04    0.15    0.25    0.04    0.18    0.02    0.09    0.24   
nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan    
0.09    0.09    0.10    0.09    0.08    0.05    0.01    0.02    0.08    0.07    0.08    0.08    0.01    0.02    0.09   
0.17    0.39    0.33    0.21    0.17    0.29    0.06    0.01    0.21    0.25    0.27    0.22    0.03    0.01    0.31   
0.01    0.01    0.01    0.03    0.01    0.01    0.03    0.06    0.01    0.01    0.04    0.01    0.03    0.06    0.01   
0.06    0.01    0.07    0.07    0.07    0.03    0.06    0.12    0.09    0.08    0.04    0.04    0.04    0.03    0.10   
0.01    0.03    0.02    0.01    0.01    0.11    0.10    0.01    0.01    0.01    0.09    0.01    0.04    0.01    0.02   
0.10    0.10    0.14    0.11    0.08    0.03    0.01    0.02    0.05    0.06    0.01    0.09    0.01    0.01    0.10   
0.05    0.03    0.06    0.08    0.08    0.01    0.03    0.02    0.03    0.04    0.02    0.07    0.00    0.02    0.06   
0.04    0.05    0.03    0.02    0.08    0.03    0.02    0.02    0.06    0.05    0.02    0.06    0.03    0.01    0.02   
# New slice
0.21    0.08    0.15    0.11    0.15    0.05    0.01    0.01    0.06    0.04    0.02    0.13    0.02    0.02    0.16   
0.26    0.14    0.18    0.12    0.22    0.10    0.10    0.07    0.12    0.17    0.09    0.18    0.03    0.02    0.13   
0.10    0.13    0.13    0.05    0.08    0.08    0.08    0.03    0.03    0.06    0.10    0.06    0.05    0.02    0.05   
nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan     nan    
0.11    0.08    0.10    0.07    0.10    0.05    0.02    0.02    0.05    0.03    0.03    0.10    0.05    0.04    0.10   
0.13    0.20    0.18    0.12    0.12    0.17    0.03    0.01    0.12    0.10    0.12    0.15    0.09    0.04    0.16   
0.02    0.01    0.01    0.06    0.03    0.01    0.03    0.06    0.02    0.01    0.04    0.04    0.04    0.05    0.04   
0.02    0.02    0.04    0.03    0.05    0.04    0.07    0.03    0.04    0.01    0.02    0.06    0.03    0.03    0.03   
0.02    0.03    0.02    0.02    0.02    0.07    0.04    0.02    0.01    0.01    0.04    0.02    0.03    0.02    0.02   
0.07    0.11    0.14    0.03    0.04    0.08    0.01    0.01    0.10    0.11    0.01    0.02    0.01    0.01    0.02   
0.03    0.02    0.03    0.05    0.04    0.01    0.01    0.02    0.01    0.03    0.01    0.04    0.01    0.01    0.03   
0.04    0.05    0.03    0.03    0.04    0.06    0.02    0.01    0.01    0.03    0.05    0.03    0.03    0.02    0.02   
# New slice

Upvotes: 3

Views: 1749

Answers (1)

Gerges
Gerges

Reputation: 6499

I can think of a couple of ways. Here is one way without using loops, where make a Panel first, and then convert to dataframe.

# normalize values first
max_values = np.nanmax(mat.data[idataset], axis=2)
values = mat.data[idataset]*100/max_values.reshape(max_values.shape +(1,))

# get sizes
iparticipant, imuscle, itest = mat.data[idataset].shape

# set axes labels
items = np.arange(1, 1+iparticipant)
major_axis = np.arange(1, imuscle+1)
minor_axis = np.arange(1, itest + 1)

# make a panel (3-d dataframe)
panel = pd.Panel(values, items=iparticipant, major_axis=major_axis, minor_axis=minor_axis)

# covert to dataframe and fix column labels
df = panel.to_frame().stack().reset_index()
df.columns = ['muscle', 'test', 'participant', 'relative_mvc']
df['dataset'] = idataset

Upvotes: 2

Related Questions