user36160
user36160

Reputation: 21

Issue with dropna() function and alternatives to the dropna()

I was learning to use the dropna() function in Python, in order to drop rows/columns which contained NaN/'?' values in them. However, even after seeing various solutions online, I couldn't drop data in spite of getting no syntactical errors.

I've tried the following solutions:

First Attempt

df1 = df.dropna()
df1

Continued

df1.dropna(inplace=1)
df1

The first part of the code gave me the original data frame

The second part gave me the following error:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 df1.dropna(inplace=1) 2 3 df1

~\Anaconda3\lib\site-packages\pandas\core\frame.py in dropna(self, axis, how, thresh, subset, inplace) 4259 1 Batman Batmobile 1940-04-25 4260 """ -> 4261 inplace = validate_bool_kwarg(inplace, 'inplace') 4262 if isinstance(axis, (tuple, list)): 4263 # GH20987

~\Anaconda3\lib\site-packages\pandas\util_validators.py in validate_bool_kwarg(value, arg_name) 224 raise ValueError('For argument "{arg}" expected type bool, received ' 225 'type {typ}.'.format(arg=arg_name, --> 226 typ=type(value).name)) 227 return value 228

ValueError: For argument "inplace" expected type bool, received type

Further, is there any better alternatives to dropna() function?


EDIT 1

  1. Link to my Python notebook Dealing with Missing Data.ipynb
  2. I tried to change the argument value for inplace to True, but it gives me the following error:

NameError: name 'df1' is not defined

PS All the errors and issues are visible in the code

LINK TO THE CSV FILE USED = CSV


Upvotes: 0

Views: 3125

Answers (2)

JuSt
JuSt

Reputation: 1

You should also add inplace = True to the replace function

df.replace("?", np.nan, inplace = True)

Upvotes: 0

Jaskumar Shah
Jaskumar Shah

Reputation: 26

Firstly replace ? with nan, like this:

df.replace('?', np.nan)

Then drop all the missing values using dropna (the nan's you just replaced above, like this:

df1 = df.dropna()
df1

and then use inplace to keep the DataFrame with valid entries in the same variable, like this:

df1.dropna(inplace=True)
df1

Upvotes: 1

Related Questions