Reputation: 1730
I'm trying to get the aggregate column-wise median of a series of arrays. For example:
a = np.array([[1,9,3],[1,1,1],[8,5,4]])
df = pd.DataFrame(columns=["a"])
df["a"] = list(a)
df["b"] = [1,1,2]
A = df.groupby("b")["a"].apply(lambda x: np.mean(x, axis=0))
print(A)
B = df.groupby("b")["a"].apply(lambda x: np.median(x, axis=0))
print(B)
Getting the mean works fine, but the median gives the error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Upvotes: 1
Views: 729
Reputation: 35626
np.mean
is supported by numpy and pandas explicitly. Numpy mean will check if there is a mean
attribute attached to the passed in structure (numpy source code). If there is then the pandas NDFrame.mean
function is used instead (pandas source code).
However, np.median
does not have the same such support in that numpy
does not check if there is a median
attribute it can use instead.
For this reason, the values will need to be converted to a valid 2d array first (either explicitly or implicitly by np.median
).
B = df.groupby("b")["a"].apply(lambda x: np.median([*x], axis=0))
B
:
b
1 [1.0, 5.0, 2.0]
2 [8.0, 5.0, 4.0]
Name: a, dtype: object
The following options would also work:
np.median(x.tolist(), axis=0)
np.median(np.array([*x]), axis=0)
np.median(np.array(x.tolist()), axis=0)
Upvotes: 2