Moran Reznik
Moran Reznik

Reputation: 1371

Can't filter NaN values while using pandas

This code:

import re
class_size=data["class_size"]
def is_912(string):
    if pd.isnull(string)==True:
        return False
    if re.search('09-12',string) is not None:
          return True
    else:
          return False

is_9or12=class_size['GRADE '].apply(is_912)
class_size['GRADE ']=class_size['GRADE '][is_9or12==True]
print(class_size['GRADE '])

Gives the following result:

0          NaN
1          NaN
2          NaN
3          NaN
4          NaN
5          NaN
6          NaN
...
27605    09-12
27606    09-12
27607    09-12
27608    09-12
27609    09-12

I can't understand why my function isn't filtering out the NaN values?

Upvotes: 0

Views: 329

Answers (3)

Peter Mularien
Peter Mularien

Reputation: 2638

It's because you're only replacing the values of the column GRADE where the corresponding value in your class_size vector is True. As it sounds like you may not fundamentally understand what you've built, I'll try to explain why:

is_9or12=class_size['GRADE '].apply(is_912)

This line of code is creating a new vector (array) of True or False values, of the same length as the GRADE column of the DataFrame. Since we can't see your data set, I'll assume for now the logic in the is_912 function works as you expect.

class_size['GRADE ']=class_size['GRADE '][is_9or12==True]

This line of code will replace the values in the GRADE column with itself (?) for indexes where the is_9or12 vector has a True value.

Since you aren't doing anything to change or remove the NaN values in the column, they remain exactly where they were before you executed these lines of code.

Again, without a reproducible example, this is only speculation, but I think it's what you're asking for.

Upvotes: 0

T.Az
T.Az

Reputation: 1

If you want to drop the nans you can use the dropna method, look this example :

df = pd.DataFrame(np.nan, index=np.arange(5), columns=['A'])

without_nan = df.dropna(subset = ['A'])

In some cases it will be better to use :

df[np.isfinite(df['A'])]

Upvotes: 0

John Zwinck
John Zwinck

Reputation: 249153

Stop using apply(). It defeats the purpose of Pandas, which is efficient vectorized computation. It reintroduces Python looping and slow execution, and 98% of the time it is used without necessity.

Try something like this:

class_size = data["class_size"]
is_9or12 = class_size['GRADE '].isin(('09', '10', '11', '12'))

It isn't really clear from your question what your data look like, but here are some string-specific methods that work in Pandas and are fast:

https://pandas.pydata.org/pandas-docs/stable/text.html

Upvotes: 0

Related Questions