Reputation: 27
I have a pandas dataframe in which each row has a numpy array. It looks something like this:
| Column 1 |
|----------------|
| [nan, 4, 5] |
| [3, 2, 6] |
| [3, 3, 4]. |
I'm trying to get the average of those arrays, using:
Avg = df['Column1'].mean()
Even though ".mean()" skips nan by default, this is not the case here. Since the row isn't actually empty and just one value from the array is missing, I get the following result:
print(Avg)
> [nan, 3, 5]
How can I ignore the missing value from the first row? Ideally, this is what I am trying to achieve:
print(Avg)
> [3, 3, 5]
*Note that the first average should be (3+3)/2
, not (3+3)/3
. So filling the arrays with zeros is not an option.
Upvotes: 1
Views: 2207
Reputation: 323306
Try with
pd.DataFrame(df['Column 1'].tolist()).mean().tolist()
Out[127]: [3.0, 3.0, 5.0]
Upvotes: 1