Asra Khalid
Asra Khalid

Reputation: 177

How to deal with NaN values in data in Python?

I have a large data set containing many NaN values in multiple columns.

I have tried the following code but it is not dropping Nan value from the data set

df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"])
df['Deviation from Partisanship'].unique()

Output:

array([nan, 'Vote for opposing party', 'Vote for own party'], dtype=object)

It clearly shows there is still some nan values available. How can I remove them?

Upvotes: 1

Views: 1190

Answers (3)

Pierre Gourseaud
Pierre Gourseaud

Reputation: 2477

# Method 1
df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
df['Deviation from Partisanship'].unique()

# Method 2
df = pd.read_excel('sec3_data.xlsx')
df2 = df.dropna(subset=["Deviation from Partisanship"])
df2['Deviation from Partisanship'].unique()

Upvotes: 0

Simon
Simon

Reputation: 10158

You need to either reassign to a new dataframe:

df2 = df.dropna(subset=["Deviation from Partisanship"])

Or perform the drop inplace:

df.dropna(subset=["Deviation from Partisanship"], inplace=True)

You can find more info in the docs here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

Upvotes: 2

Akash Kumar
Akash Kumar

Reputation: 1406

You need to write it as,

df = df.dropna(subset=["Deviation from Partisanship"])

or,

df.dropna(subset=["Deviation from Partisanship"], inplace=True)

Upvotes: 1

Related Questions