ICantHandleThis
ICantHandleThis

Reputation: 65

Replacing string values with mean of column in dataframe

I have a data file I load and process with a pandas Dataframe. My code works, but I'm wondering if there is a more efficient way to achieve what I'm trying to do. My code is as follows:

df = pd.read_csv("file_name.data", sep="\s+", names=["A","B","Horsepower"])
df1 = df[df.Horsepower != '?']
df2 = df1["Horsepower"].apply(pd.to_numeric)
df.replace('?', df2.mean())

In the data itself, the Horsepower column comes with several missing values that have been replaced with '?'. The above code replaces these '?' values with the mean of the Horsepower column, excluding the '?' values.

With that established, is there a more efficient way to replace the '?' values in "Horsepower" with the mean of the "Horsepower" column?

Upvotes: 2

Views: 2287

Answers (1)

ALollz
ALollz

Reputation: 59529

This will work and will convert anything that cant be converted into a number to NaN upon averaging.

df['Horsepower'] = df['Horsepower'].replace('?', 
    np.mean(pd.to_numeric(df['Horsepower'], errors='coerce')))

Upvotes: 2

Related Questions