Arun
Arun

Reputation: 2478

Dealing with missing value in a column using pandas

I am using the Auto MPG dataset which contains missing values in the column/attribute horsepower in the form of ? characters.

Hence when I use the code-

data.isnull.values.any()

OR

data["horsepower"].isnull.values.any()

Both of them return False since these codes work for NaN values or blank values.

How can I locate such missing values containing special character, which in my case happens to be ? rather than the traditional NaN value(s).

Thanks!

Upvotes: 2

Views: 1248

Answers (3)

patelnisheet
patelnisheet

Reputation: 134

you need to convert ? to NaN first. After then You can go for finding null values in it.

1) to convert ? to NaN :

data.replace('?',np.NaN)

2) to find null values:

pd.isna(data['horsepower'])

it will return dataframe with series of True/False.

Upvotes: 1

anky
anky

Reputation: 75080

you can define na_values as ? or use the below:

df.replace(r'[\W]',np.nan,regex=True)

\W finds any character that is not a letter, numeric digit, or the underscore character.

Upvotes: 2

jezrael
jezrael

Reputation: 862511

Use replace before checking NaNs:

data["horsepower"].replace('?',np.nan).isnull().values.any()

If DataFrame is created by read_csv add parameter na_values for converting ? to NaNs:

data = pd.read_csv(path, na_values=["?"])
data["horsepower"].isnull().values.any()

Upvotes: 2

Related Questions