RCarmody
RCarmody

Reputation: 720

Pandas Cannot Filter Our nan Values

I have no idea why this isn't working... Why am I not able to get rid of these?

I have tried the following:

dfa = dfa[dfa['Date Sold_y'].str.len() < 4] #empty
dfa = dfa[dfa['Date Sold_y'] != ''] #no change
dfa = dfa[dfa['Date Sold_y'] != np.nan] #no change

Dtype is string, and sample values below:

['May-30-2018', nan, nan, 'June-11-2014', 'December-3-2021', nan, 'February-2-2022', nan, nan, 'December-30-2011', nan, nan, nan, nan, nan, nan, nan, nan, 'November-30-2021', nan, 'April-1-2020', nan, 'May-10-2007', nan, nan, nan, nan, nan, nan, 'January-28-2022', nan, nan, nan, 'January-18-2022', nan, nan, nan, 'January-12-2022', nan, 'November-15-2021'

Upvotes: 1

Views: 62

Answers (2)

Davide Laghi
Davide Laghi

Reputation: 126

By the way if the values are actually nan (and not strings) check out the dropna() method of pandas.DataFrame. It allows to drop rows of the dataframe if one or more nan is found (you can chose) or you can specify a subset of columns to check against nan values

Upvotes: 0

Corralien
Corralien

Reputation: 120391

  1. Maybe nan values are string with extra whitespaces:
>>> dfa[dfa['Date Sold_y'].str.strip() != 'nan']
         Date Sold_y
0        May-30-2018
3       June-11-2014
4    December-3-2021
6    February-2-2022
9   December-30-2011
18  November-30-2021
20      April-1-2020
22       May-10-2007
29   January-28-2022
33   January-18-2022
37   January-12-2022
39  November-15-2021
  1. You can also reverse the logic and keep rows ended by a year:
>>> dfa[dfa['Date Sold_y'].str.contains('\d{4}$')]
  1. Or if it's really nan values, as suggested by @HenryEcker:
>>> dfa[dfa['Date Sold_y'].notna()]

# OR

>>> dfa[~dfa['Date Sold_y'].isna()]

Upvotes: 1

Related Questions