Reputation: 177
I have a large data set containing many NaN values in multiple columns.
I have tried the following code but it is not dropping Nan value from the data set
df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"])
df['Deviation from Partisanship'].unique()
Output:
array([nan, 'Vote for opposing party', 'Vote for own party'], dtype=object)
It clearly shows there is still some nan values available. How can I remove them?
Upvotes: 1
Views: 1190
Reputation: 2477
# Method 1
df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
df['Deviation from Partisanship'].unique()
# Method 2
df = pd.read_excel('sec3_data.xlsx')
df2 = df.dropna(subset=["Deviation from Partisanship"])
df2['Deviation from Partisanship'].unique()
Upvotes: 0
Reputation: 10158
You need to either reassign to a new dataframe:
df2 = df.dropna(subset=["Deviation from Partisanship"])
Or perform the drop inplace
:
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
You can find more info in the docs here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
Upvotes: 2
Reputation: 1406
You need to write it as,
df = df.dropna(subset=["Deviation from Partisanship"])
or,
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
Upvotes: 1