joao leal
joao leal

Reputation: 19

Calculate the mean in pandas while a column has a string

I am currently learning pandas and I am using an imdb movies database, which one of the columns is the duration of the movies. However, one of the values is "None", so I can´t calculate the mean because there is this string in the middle. I thought of changing the "None" to = 0, however that would skew the results. Like can be seen with the code below.

dur_temp = duration.replace("None", 0)
dur_temp = dur_temp.astype(float)
descricao_duration = dur_temp.mean()

Any ideas on what I should do in order to not skew the data? I also graphed it and it becomes more clear how it skews it.

Upvotes: 0

Views: 1780

Answers (5)

ansev
ansev

Reputation: 30920

if you want it working for any string in your pandas serie, you could use pd.to_numeric:

pd.to_numeric(dur_temp, errors='coerce').mean()

in this way all the values ​​that cannot be converted to float will be replaced by NaN regardless of which is

Upvotes: 3

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79065

You can use fillna(value=np.nan) as shown below:

descricao_duration = dur_temp.fillna(value=np.nan).mean()

Demo:

import pandas as pd
import numpy as np

dur_temp = pd.DataFrame({'duration': [10, 20, None, 15, None]})
descricao_duration = dur_temp.fillna(value=np.nan).mean()
print(descricao_duration)

Output:

duration    15.0
dtype: float64

Upvotes: 0

Faika Majid
Faika Majid

Reputation: 77

Make them np.NAN values

I am writing it as answer because i can't comment df = df.replace('None ', np.NaN) or df.replace('None', np.NaN, inplace=True)

Upvotes: 1

Tanmay Shrivastava
Tanmay Shrivastava

Reputation: 579

Just filter by condition like this

df[df['a']!='None'] #assuming your mean values are in column a

Upvotes: 1

Alessandro
Alessandro

Reputation: 381

You can replace "None" with numpy.nan, instead that using 0.

Something like this should do the trick:

import numpy as np
dur_temp = duration.replace("None", np.nan)
descricao_duration = dur_temp.mean()

Upvotes: 2

Related Questions