add-semi-colons
add-semi-colons

Reputation: 18810

Pandas data frame removing rows with 'nan' by column name

After reading excel via pandas read_excel end up with rows with that has type string 'nan'. I tried to drop them using all the available method discussed here but seems like it doesn't work:

Here are the attempts:

df.dropna(subset=['A'], inplace=True)

I thought this would work, it reduced the number of rows from the data frame without removing rows that has 'nan'

df = df[df.A.str.match('nan') == False]

Upvotes: 0

Views: 2948

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Better way of doing it is by boolean indexing since they are strings i.e

df = pd.DataFrame({"A":['nan',1,2,3],'B':[1,2,3,'nan']})

# To remove 'nan's from only A
print(df[(df.A!='nan')])

#   A    B
#1  1    2
#2  2    3
#3  3  nan


#For removing all the rows that hold `nan`
print(df[(df!='nan').all(1)])
#   A  B
#1  1  2
#2  2  3

Upvotes: 1

BENY
BENY

Reputation: 323306

We can replace 'nan' first then use dropna

df.replace({'A':{'nan':np.nan}}).dropna(subset=['A'], inplace=True)

Upvotes: 1

Related Questions