Reputation: 931
I have a dataframe in pandas where one of the column (i.e., column 'b') contains strings with $ symbols:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [51, 2,32,99,81], 'b': ['$3', '$4','$-','$0','$23']})
I want to filter the dataframe such that I only retain the rows where column'b' only returns integers other then zero and the $ symbol is discarded.
My desired output is:
Any feedback is welcome.
Upvotes: 1
Views: 72
Reputation: 210842
In [64]: df = pd.DataFrame({'a': [51, 2,32,99,81], 'b': ['$3', '$4','$-','$0','$23']})
In [65]: df['b'] = pd.to_numeric(df['b'].str.replace(r'\D+', ''), errors='coerce')
In [67]: df
Out[67]:
a b
0 51 3.0
1 2 4.0
2 32 NaN
3 99 0.0
4 81 23.0
In [68]: df = df[df['b'].notnull() & df['b'].ne(0)]
In [69]: df
Out[69]:
a b
0 51 3.0
1 2 4.0
4 81 23.0
alternatively we can filter it this way:
In [73]: df = df.query("b == b and b != 0")
In [74]: df
Out[74]:
a b
0 51 3.0
1 2 4.0
4 81 23.0
Upvotes: 6
Reputation: 402483
Similarly, using pd.to_numeric
(assuming your data has this same structure throughout).
df.b = pd.to_numeric(df.b.str[1:], errors='coerce')
print(df)
a b
0 51 3.0
1 2 4.0
2 32 NaN
3 99 0.0
4 81 23.0
print (df.dropna(subset=['b']))
a b
0 51 3.0
1 2 4.0
3 99 0.0
4 81 23.0
If you want to filter out both NaN
s and zeros, use:
print (df[df.b.notnull() & df.b.ne(0)])
a b
0 51 3.0
1 2 4.0
4 81 23.0
Upvotes: 4
Reputation: 323226
df.b=pd.to_numeric(df['b'].str.replace('$', ''),errors='coerce')
df
Out[603]:
a b
0 51 3.0
1 2 4.0
2 32 NaN
3 99 0.0
4 81 23.0
df.loc[(df.b.notnull())&(df.b!=0),:]
Out[604]:
a b
0 51 3.0
1 2 4.0
4 81 23.0
Upvotes: 4