Reputation: 85
I have the following numpy 3d array:
mat.data['Sylvain_2015'].shape = (180, 12, 15)
This array is populated with a variable (muscle activation) for each participant
(first dimension: 180), for each muscle
(second dimension: 12), for each condition
(third dimension: 15).
I want to transform this array to the following pandas dataframe:
muscle participant activation test
0 1 1 100.000000 1
1 1 1 69.322225 2
2 1 1 84.917656 3
3 1 1 80.983069 4
4 1 1 65.163384 5
5 1 1 30.528706 6
Is there a more efficient way than with three for loops:
participants, muscles, tests, relative_mvc = ([] for i in range(4))
for iparticipant in range(mat.data[idataset].shape[0]):
for imuscle in range(mat.data[idataset].shape[1]):
max_mvc = np.nanmax(mat.data[idataset][iparticipant, imuscle, :])
for itest in range(mat.data[idataset].shape[2]):
participants.append(iparticipant+1)
datasets.append(idataset)
muscles.append(imuscle+1)
tests.append(itest+1)
# normalize mvc (relative to max)
activation.append(mat.data[idataset][iparticipant, imuscle, itest]*100/max_mvc)
df = pd.DataFrame({
'participant': participants,
'dataset': datasets,
'muscle': muscles,
'test': tests,
'relative_mvc': relative_mvc,
}).dropna()
Here is a sample of the 3d array for two participants (created with this useful post)
# Array shape: (2, 12, 15)
0.13 0.09 0.11 0.11 0.09 0.04 0.03 0.06 0.11 0.09 0.03 0.10 0.01 0.03 0.08
0.21 0.36 0.34 0.18 0.25 0.23 0.11 0.05 0.27 0.27 0.13 0.26 0.04 0.02 0.34
0.16 0.09 0.41 0.28 0.20 0.10 0.16 0.04 0.15 0.25 0.04 0.18 0.02 0.09 0.24
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
0.09 0.09 0.10 0.09 0.08 0.05 0.01 0.02 0.08 0.07 0.08 0.08 0.01 0.02 0.09
0.17 0.39 0.33 0.21 0.17 0.29 0.06 0.01 0.21 0.25 0.27 0.22 0.03 0.01 0.31
0.01 0.01 0.01 0.03 0.01 0.01 0.03 0.06 0.01 0.01 0.04 0.01 0.03 0.06 0.01
0.06 0.01 0.07 0.07 0.07 0.03 0.06 0.12 0.09 0.08 0.04 0.04 0.04 0.03 0.10
0.01 0.03 0.02 0.01 0.01 0.11 0.10 0.01 0.01 0.01 0.09 0.01 0.04 0.01 0.02
0.10 0.10 0.14 0.11 0.08 0.03 0.01 0.02 0.05 0.06 0.01 0.09 0.01 0.01 0.10
0.05 0.03 0.06 0.08 0.08 0.01 0.03 0.02 0.03 0.04 0.02 0.07 0.00 0.02 0.06
0.04 0.05 0.03 0.02 0.08 0.03 0.02 0.02 0.06 0.05 0.02 0.06 0.03 0.01 0.02
# New slice
0.21 0.08 0.15 0.11 0.15 0.05 0.01 0.01 0.06 0.04 0.02 0.13 0.02 0.02 0.16
0.26 0.14 0.18 0.12 0.22 0.10 0.10 0.07 0.12 0.17 0.09 0.18 0.03 0.02 0.13
0.10 0.13 0.13 0.05 0.08 0.08 0.08 0.03 0.03 0.06 0.10 0.06 0.05 0.02 0.05
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
0.11 0.08 0.10 0.07 0.10 0.05 0.02 0.02 0.05 0.03 0.03 0.10 0.05 0.04 0.10
0.13 0.20 0.18 0.12 0.12 0.17 0.03 0.01 0.12 0.10 0.12 0.15 0.09 0.04 0.16
0.02 0.01 0.01 0.06 0.03 0.01 0.03 0.06 0.02 0.01 0.04 0.04 0.04 0.05 0.04
0.02 0.02 0.04 0.03 0.05 0.04 0.07 0.03 0.04 0.01 0.02 0.06 0.03 0.03 0.03
0.02 0.03 0.02 0.02 0.02 0.07 0.04 0.02 0.01 0.01 0.04 0.02 0.03 0.02 0.02
0.07 0.11 0.14 0.03 0.04 0.08 0.01 0.01 0.10 0.11 0.01 0.02 0.01 0.01 0.02
0.03 0.02 0.03 0.05 0.04 0.01 0.01 0.02 0.01 0.03 0.01 0.04 0.01 0.01 0.03
0.04 0.05 0.03 0.03 0.04 0.06 0.02 0.01 0.01 0.03 0.05 0.03 0.03 0.02 0.02
# New slice
Upvotes: 3
Views: 1749
Reputation: 6499
I can think of a couple of ways. Here is one way without using loops, where make a Panel first, and then convert to dataframe.
# normalize values first
max_values = np.nanmax(mat.data[idataset], axis=2)
values = mat.data[idataset]*100/max_values.reshape(max_values.shape +(1,))
# get sizes
iparticipant, imuscle, itest = mat.data[idataset].shape
# set axes labels
items = np.arange(1, 1+iparticipant)
major_axis = np.arange(1, imuscle+1)
minor_axis = np.arange(1, itest + 1)
# make a panel (3-d dataframe)
panel = pd.Panel(values, items=iparticipant, major_axis=major_axis, minor_axis=minor_axis)
# covert to dataframe and fix column labels
df = panel.to_frame().stack().reset_index()
df.columns = ['muscle', 'test', 'participant', 'relative_mvc']
df['dataset'] = idataset
Upvotes: 2