Raphael
Raphael

Reputation: 1730

Pandas groupby + Numpy median: The truth value of an array with more than one element is ambiguous

I'm trying to get the aggregate column-wise median of a series of arrays. For example:

a = np.array([[1,9,3],[1,1,1],[8,5,4]])
df = pd.DataFrame(columns=["a"])
df["a"] = list(a)
df["b"] = [1,1,2]
A = df.groupby("b")["a"].apply(lambda x: np.mean(x, axis=0))
print(A)
B = df.groupby("b")["a"].apply(lambda x: np.median(x, axis=0))
print(B)

Getting the mean works fine, but the median gives the error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Upvotes: 1

Views: 729

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

np.mean is supported by numpy and pandas explicitly. Numpy mean will check if there is a mean attribute attached to the passed in structure (numpy source code). If there is then the pandas NDFrame.mean function is used instead (pandas source code).

However, np.median does not have the same such support in that numpy does not check if there is a median attribute it can use instead.

For this reason, the values will need to be converted to a valid 2d array first (either explicitly or implicitly by np.median).

B = df.groupby("b")["a"].apply(lambda x: np.median([*x], axis=0))

B:

b
1    [1.0, 5.0, 2.0]
2    [8.0, 5.0, 4.0]
Name: a, dtype: object

The following options would also work:

  • np.median(x.tolist(), axis=0)
  • np.median(np.array([*x]), axis=0)
  • np.median(np.array(x.tolist()), axis=0)

Upvotes: 2

Related Questions