Reputation: 1371
This code:
import re
class_size=data["class_size"]
def is_912(string):
if pd.isnull(string)==True:
return False
if re.search('09-12',string) is not None:
return True
else:
return False
is_9or12=class_size['GRADE '].apply(is_912)
class_size['GRADE ']=class_size['GRADE '][is_9or12==True]
print(class_size['GRADE '])
Gives the following result:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
...
27605 09-12
27606 09-12
27607 09-12
27608 09-12
27609 09-12
I can't understand why my function isn't filtering out the NaN values?
Upvotes: 0
Views: 329
Reputation: 2638
It's because you're only replacing the values of the column GRADE
where the corresponding value in your class_size
vector is True
. As it sounds like you may not fundamentally understand what you've built, I'll try to explain why:
is_9or12=class_size['GRADE '].apply(is_912)
This line of code is creating a new vector (array) of True
or False
values, of the same length as the GRADE
column of the DataFrame. Since we can't see your data set, I'll assume for now the logic in the is_912
function works as you expect.
class_size['GRADE ']=class_size['GRADE '][is_9or12==True]
This line of code will replace the values in the GRADE
column with itself (?) for indexes where the is_9or12
vector has a True
value.
Since you aren't doing anything to change or remove the NaN
values in the column, they remain exactly where they were before you executed these lines of code.
Again, without a reproducible example, this is only speculation, but I think it's what you're asking for.
Upvotes: 0
Reputation: 1
If you want to drop the nans you can use the dropna method, look this example :
df = pd.DataFrame(np.nan, index=np.arange(5), columns=['A'])
without_nan = df.dropna(subset = ['A'])
In some cases it will be better to use :
df[np.isfinite(df['A'])]
Upvotes: 0
Reputation: 249153
Stop using apply()
. It defeats the purpose of Pandas, which is efficient vectorized computation. It reintroduces Python looping and slow execution, and 98% of the time it is used without necessity.
Try something like this:
class_size = data["class_size"]
is_9or12 = class_size['GRADE '].isin(('09', '10', '11', '12'))
It isn't really clear from your question what your data look like, but here are some string-specific methods that work in Pandas and are fast:
https://pandas.pydata.org/pandas-docs/stable/text.html
Upvotes: 0