Reputation: 57
I have a dataframe that looks like this:
A
1 [67.0, 51.0, 23.0, 49.0, 3.0]
2 0
3 [595.0]
4 0
5 [446.0, 564.0, 402.0]
6 0
7 0
I would like to find the mean for each list ignoring the zeros. I want to get something like:
A Mean
1 [67.0, 51.0, 23.0, 49.0, 3.0] 38.6
2 0 0
3 [595.0] 595.0
4 0 0
5 [446.0, 564.0, 402.0] 470.7
6 0 0
7 0 0
I tried many possible solutions listed here and none of them worked. This is what I tried so far:
df['Mean'] = df.A.apply(lambda x: mean(x))
which gives me this error
TypeError: 'int' object is not iterable
Also this
df['Mean'] = df['A'].mean(axis=1)
ValueError: No axis named 1 for object type
Tried these as well with no luck:
a = np.array( df['A'].tolist())
a.mean(axis=1)
mean(d for d in a if d)
Is there something else I can try that would give me the expected outcome? Thanks for your help.
Upvotes: 0
Views: 225
Reputation: 6181
from collections.abc import Iterable
import numpy as np
def calculate_mean(x):
if isinstance(x["A"], Iterable):
x["mean"] = np.mean(x["A"])
else:
x["mean"] = x["A"]
return x
df = df.apply(lambda x: calculate_mean(x), axis=1)
Edit -
df["mean"] = df.apply(lambda x: np.mean(x["A"]), axis=1)
Upvotes: 0
Reputation: 88236
One way is to use a list comprehension and compute the mean
where a given row is a list, which can be checked with isinstance
. This is necessary or otherwise you will be getting:
TypeError: 'int' object is not iterable
As the function is expecting an iterable. So you can do:
from statistics import mean
df['mean'] = [mean(i) if isinstance(i, list) else i for i in df.A]
A mean
0 [67.0, 51.0, 23.0, 49.0, 3.0] 38.600000
1 0 0.000000
2 [595.0] 595.000000
3 0 0.000000
4 [446.0, 564.0, 402.0] 470.666667
5 0 0.000000
6 0 0.000000
Or you can also use np.mean
which does handle both ints
and iterables:
import numpy as np
df['mean'] = df.A.map(np.mean)
A mean
0 [67.0, 51.0, 23.0, 49.0, 3.0] 38.600000
1 0 0.000000
2 [595.0] 595.000000
3 0 0.000000
4 [446.0, 564.0, 402.0] 470.666667
5 0 0.000000
6 0 0.000000
Upvotes: 1
Reputation: 3770
okay this works for me
A
1 [67.0, 51.0, 23.0, 49.0, 3.0]
2 0
3 [595.0]
4 0
5 [446.0, 564.0, 402.0]
6 0
7 0
using np.mean
data['A'].apply(lambda x: np.mean(eval(x)))
Output
A Mean
1 [67.0, 51.0, 23.0, 49.0, 3.0] 38.600000
2 0 0.000000
3 [595.0] 595.000000
4 0 0.000000
5 [446.0, 564.0, 402.0] 470.666667
6 0 0.000000
7 0 0.000000
Upvotes: 1