Reputation: 51
I have a code where there is a list X
appends multiple lists of different lengths. For instance: the final value of X
after a run can look like this:
X = [[0.6904056370258331, 0.6844439387321473, 0.668782365322113],
[0.7253621816635132, 0.6941058218479157, 0.6929935097694397, 0.6919471859931946, 0.6905447959899902]]
As you can see, X[0]
is of length = 3 while X[1]
is of length = 5. I want to do an element-wise (column-wise) mean of X
to generate a single 1D mean of X
. If I try np.mean(X, axis=0)
it raises error as both X[0]
and X[1]
are of different lengths. Is there a way to achieve what I am looking for, i.e., a single 1D mean of X
?
Thank you,
Upvotes: 0
Views: 804
Reputation: 231530
To do 'column' calculations we need to change this into a list of the columns.
In [475]: X = [[0.6904056370258331, 0.6844439387321473, 0.668782365322113],
...: [0.7253621816635132, 0.6941058218479157, 0.6929935097694397, 0.6919471859931946, 0.6905447959899902]]
zip_longest
is a handy tool for 'transposing' irregular lists:
In [476]: import itertools
In [477]: T = list(itertools.zip_longest(*X, fillvalue=np.nan))
In [478]: T
Out[478]:
[(0.6904056370258331, 0.7253621816635132),
(0.6844439387321473, 0.6941058218479157),
(0.668782365322113, 0.6929935097694397),
(nan, 0.6919471859931946),
(nan, 0.6905447959899902)]
I chose np.nan
as the fill because I can then use np.nanmean
to take the mean, while ignoring the nan
.
In [479]: [np.nanmean(i) for i in T]
Out[479]:
[0.7078839093446732,
0.6892748802900315,
0.6808879375457764,
0.6919471859931946,
0.6905447959899902]
For something like np.sum
I could fill will 0's, but mean
is the sum divided by the count.
Or without nanmean
, fill with something that's easy to filter out:
In [480]: T = list(itertools.zip_longest(*X, fillvalue=None))
In [483]: [np.mean([i for i in row if i is not None]) for row in T]
Out[483]:
[0.7078839093446732,
0.6892748802900315,
0.6808879375457764,
0.6919471859931946,
0.6905447959899902]
zip_longest
isn't the only one, but it's reasonably fast, and easy to remember and use.
Upvotes: 2
Reputation: 333
How about this
first determine the maximum row length, then fill all rows to the same length with nans and the use nanmean with axis=0 as in the question.
import numpy as np
X = [[0.6904056370258331, 0.6844439387321473, 0.668782365322113],
[0.7253621816635132, 0.6941058218479157, 0.6929935097694397, 0.6919471859931946, 0.6905447959899902]]
max_row_len=max([len(ll) for ll in X])
cm=np.nanmean([[el for el in row ] + [np.NaN] * max(0, max_row_len-len(row)) for row in X], axis=0)
print(cm)
will display
[0.70788391 0.68927488 0.68088794 0.69194719 0.6905448 ]
Upvotes: 0