Reputation: 7746
I have a DataFrame that looks like this. How do I get rid of the rows that contain a Nan?
I have tried several iterations, for example
df = df.dropna()
none of them work
pSpot quoteDatetime underlyingSymbol expiration strike optionType bid ask underlyingBid underlyingAsk iv id
0 1.0000 2017-01-06T12:00:00 SPX 2017-01-20T00:00:00 2280 C 13.0000 14.0000 2277.4000 2278.0800 9.3101 0.0000
27 1.0000 2017-01-06T12:00:00 SPX 2017-02-17T00:00:00 2280 C 26.7000 27.9000 2277.4000 2278.0800 8.9137 27.0000
113 1.0000 2017-01-06T12:00:00 SPX 2017-03-17T00:00:00 2280 C 39.3000 40.7000 2277.4000 2278.0800 9.5947 113.0000
253 1.0000 2017-01-06T12:00:00 SPX 2017-04-21T00:00:00 2280 C 52.5000 54.1000 2277.4000 2278.0800 10.0710 253.0000
0 1.0010 2017-01-06T12:00:00 SPX 2017-06-16T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 10.5000 nan
0 1.0010 2017-01-06T12:00:00 SPX 2017-09-15T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 11.1300 nan
0 1.0010 2017-01-06T12:00:00 SPX 2017-12-15T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 11.5800 nan
0 1.0010 2017-01-06T12:00:00 SPX 2018-01-19T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 11.7200 nan
0 1.0010 2017-01-06T12:00:00 SPX 2018-06-15T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 12.1000 nan
0 1.0010 2017-01-06T12:00:00 SPX 2018-12-21T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 11.7600 nan
0 1.0010 2017-01-06T12:00:00 SPX 2019-12-20T00:00:00 2280 C NaN NaN 2277.4000 2278.0800 12.0600 nan
df.head.to_dict() is not very enlightening:
{'pSpot': {0: 1.0, 1: 1.0, 2: 1.01, 3: 1.01, 4: 1.01}, 'quoteDatetime': {0: '2017-01-06T12:00:00', 1: '2017-01-06T12:00:00', 2: '2017-01-06T12:00:00', 3: '2017-01-06T12:00:00', 4: '2017-01-06T12:00:00'}, 'underlyingSymbol': {0: 'SPX', 1: 'SPX', 2: 'SPX', 3: 'SPX', 4: 'SPX'}, 'expiration': {0: '2017-01-20T00:00:00', 1: '2017-01-20T00:00:00', 2: '2017-01-20T00:00:00', 3: '2017-01-20T00:00:00', 4: '2017-01-20T00:00:00'}, 'strike': {0: 2280, 1: 2285, 2: 2290, 3: 2295, 4: 2300}, 'optionType': {0: 'C', 1: 'C', 2: 'C', 3: 'C', 4: 'C'}, 'bid': {0: 13.0, 1: 10.4, 2: 8.7, 3: 6.5, 4: 5.1}, 'ask': {0: 14.0, 1: 11.4, 2: 9.2, 3: 7.4, 4: 5.9}, 'underlyingBid': {0: 2277.4, 1: 2277.4, 2: 2277.4, 3: 2277.4, 4: 2277.4}, 'underlyingAsk': {0: 2278.08, 1: 2278.08, 2: 2278.08, 3: 2278.08, 4: 2278.08}, 'iv': {0: 9.3101, 1: 9.088, 2: 9.1241, 3: 8.9417, 4: 8.902}, 'id': {0: 0.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0}}
print(df.dtypes)
pSpot float64
quoteDatetime object
underlyingSymbol object
expiration object
strike int64
optionType object
bid object
ask object
underlyingBid float64
underlyingAsk float64
iv float64
id float64
dtype: object
This is really frustrating. if I try to convert the bid
column to a string, it doesn't work:
Before:
print(df.dtypes)
pSpot float64
quoteDatetime object
underlyingSymbol object
expiration object
strike int64
optionType object
bid object
ask object
underlyingBid float64
underlyingAsk float64
iv float64
id float64
dtype: object
After
df[['bid']] = df[['bid']].astype(str)
print(df.dtypes)
pSpot float64
quoteDatetime object
underlyingSymbol object
expiration object
strike int64
optionType object
bid object
ask object
underlyingBid float64
underlyingAsk float64
iv float64
id float64
dtype: object
Upvotes: 0
Views: 3026
Reputation: 1189
# Drop rows which have any NaN (you need to use this)
df2=df.dropna()
# Drop rows which have all NaN in its row
df2=df.dropna(how='all')
# Drow rows which have at least 2 NaNs
df2=df.dropna(thresh=2)
# Drow rows which have NaNs in specific column
df2=df.dropna(subset=[1])
Note.
To expect the result as you predict, data type of columns should be consistent
Following code might work only if 3 columns data type are consistent, for example, float in following example
df2=df.dropna()
So, before using above function, check the data type of your dataframe by using print(df.dtypes)
Upvotes: 5
Reputation: 774
Two things;
1: the 'how' parameter specifies how many items in the row / column need to be NaN in order for it to be dropped. So by setting how='all'
, it will only drop the rows which are all NaN. If you want to drop rows that contain NaN in any space, you can specify how='any'
, which is the default.
2: As others have said, calling the operation without passing inplace=True
will not modify the dataframe. You have can either assign the operation to a dataframe, or pass the inplace=True
agrument. I believe best practice is to assign to another df as the inplace argument will be deprecated and has larger overhead than returning the value.
Try this:
df2 = df.dropna()
Or:
df.dropna(inplace=True)
Upvotes: 0
Reputation: 330
You are effectively dropping the NaN
rows. However, the method returns a new copy with the rows removed. You need to do this:
df=df.dropna()
Or use inplace=True
, which will alter the original dataframe.
df.dropna(inplace=True)
Upvotes: -2