Ivan
Ivan

Reputation: 7746

How to drop rows that contain NaN from a DataFrame

I have a DataFrame that looks like this. How do I get rid of the rows that contain a Nan?

I have tried several iterations, for example

df = df.dropna()

none of them work

     pSpot        quoteDatetime underlyingSymbol           expiration  strike optionType     bid     ask  underlyingBid  underlyingAsk      iv       id
0   1.0000  2017-01-06T12:00:00              SPX  2017-01-20T00:00:00    2280          C 13.0000 14.0000      2277.4000      2278.0800  9.3101   0.0000
27  1.0000  2017-01-06T12:00:00              SPX  2017-02-17T00:00:00    2280          C 26.7000 27.9000      2277.4000      2278.0800  8.9137  27.0000
113 1.0000  2017-01-06T12:00:00              SPX  2017-03-17T00:00:00    2280          C 39.3000 40.7000      2277.4000      2278.0800  9.5947 113.0000
253 1.0000  2017-01-06T12:00:00              SPX  2017-04-21T00:00:00    2280          C 52.5000 54.1000      2277.4000      2278.0800 10.0710 253.0000
0   1.0010  2017-01-06T12:00:00              SPX  2017-06-16T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 10.5000      nan
0   1.0010  2017-01-06T12:00:00              SPX  2017-09-15T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 11.1300      nan
0   1.0010  2017-01-06T12:00:00              SPX  2017-12-15T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 11.5800      nan
0   1.0010  2017-01-06T12:00:00              SPX  2018-01-19T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 11.7200      nan
0   1.0010  2017-01-06T12:00:00              SPX  2018-06-15T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 12.1000      nan
0   1.0010  2017-01-06T12:00:00              SPX  2018-12-21T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 11.7600      nan
0   1.0010  2017-01-06T12:00:00              SPX  2019-12-20T00:00:00    2280          C     NaN     NaN      2277.4000      2278.0800 12.0600      nan


df.head.to_dict() is not very enlightening:

{'pSpot': {0: 1.0, 1: 1.0, 2: 1.01, 3: 1.01, 4: 1.01}, 'quoteDatetime': {0: '2017-01-06T12:00:00', 1: '2017-01-06T12:00:00', 2: '2017-01-06T12:00:00', 3: '2017-01-06T12:00:00', 4: '2017-01-06T12:00:00'}, 'underlyingSymbol': {0: 'SPX', 1: 'SPX', 2: 'SPX', 3: 'SPX', 4: 'SPX'}, 'expiration': {0: '2017-01-20T00:00:00', 1: '2017-01-20T00:00:00', 2: '2017-01-20T00:00:00', 3: '2017-01-20T00:00:00', 4: '2017-01-20T00:00:00'}, 'strike': {0: 2280, 1: 2285, 2: 2290, 3: 2295, 4: 2300}, 'optionType': {0: 'C', 1: 'C', 2: 'C', 3: 'C', 4: 'C'}, 'bid': {0: 13.0, 1: 10.4, 2: 8.7, 3: 6.5, 4: 5.1}, 'ask': {0: 14.0, 1: 11.4, 2: 9.2, 3: 7.4, 4: 5.9}, 'underlyingBid': {0: 2277.4, 1: 2277.4, 2: 2277.4, 3: 2277.4, 4: 2277.4}, 'underlyingAsk': {0: 2278.08, 1: 2278.08, 2: 2278.08, 3: 2278.08, 4: 2278.08}, 'iv': {0: 9.3101, 1: 9.088, 2: 9.1241, 3: 8.9417, 4: 8.902}, 'id': {0: 0.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0}}

print(df.dtypes)

pSpot               float64
quoteDatetime        object
underlyingSymbol     object
expiration           object
strike                int64
optionType           object
bid                  object
ask                  object
underlyingBid       float64
underlyingAsk       float64
iv                  float64
id                  float64
dtype: object

This is really frustrating. if I try to convert the bid column to a string, it doesn't work:

Before:

print(df.dtypes)

pSpot               float64
quoteDatetime        object
underlyingSymbol     object
expiration           object
strike                int64
optionType           object
bid                  object
ask                  object
underlyingBid       float64
underlyingAsk       float64
iv                  float64
id                  float64
dtype: object

After

df[['bid']] = df[['bid']].astype(str)
print(df.dtypes)

pSpot               float64
quoteDatetime        object
underlyingSymbol     object
expiration           object
strike                int64
optionType           object
bid                  object
ask                  object
underlyingBid       float64
underlyingAsk       float64
iv                  float64
id                  float64
dtype: object

Upvotes: 0

Views: 3026

Answers (3)

YoungMin Park
YoungMin Park

Reputation: 1189

# Drop rows which have any NaN (you need to use this)
df2=df.dropna()

enter image description here

# Drop rows which have all NaN in its row
df2=df.dropna(how='all')

enter image description here

# Drow rows which have at least 2 NaNs
df2=df.dropna(thresh=2)

enter image description here

# Drow rows which have NaNs in specific column
df2=df.dropna(subset=[1])

enter image description here


Note.

To expect the result as you predict, data type of columns should be consistent

Following code might work only if 3 columns data type are consistent, for example, float in following example

df2=df.dropna()

So, before using above function, check the data type of your dataframe by using print(df.dtypes)

enter image description here

Upvotes: 5

Boskosnitch
Boskosnitch

Reputation: 774

Two things;

1: the 'how' parameter specifies how many items in the row / column need to be NaN in order for it to be dropped. So by setting how='all', it will only drop the rows which are all NaN. If you want to drop rows that contain NaN in any space, you can specify how='any', which is the default.

2: As others have said, calling the operation without passing inplace=True will not modify the dataframe. You have can either assign the operation to a dataframe, or pass the inplace=True agrument. I believe best practice is to assign to another df as the inplace argument will be deprecated and has larger overhead than returning the value.

Try this:

df2 = df.dropna()

Or:

df.dropna(inplace=True)

Upvotes: 0

Diogo Silva
Diogo Silva

Reputation: 330

You are effectively dropping the NaN rows. However, the method returns a new copy with the rows removed. You need to do this:

df=df.dropna()

Or use inplace=True, which will alter the original dataframe.

df.dropna(inplace=True)

Upvotes: -2

Related Questions