Reputation: 23099
trying to teach myself pandas.. and playing around with different dtypes
I have a df as follows
df = pd.DataFrame({'ID':[0,2,"bike","cake"], 'Course':['Test','Math','Store','History'] })
print(df)
ID Course
0 0 Test
1 2 Math
2 bike Store
3 cake History
the dtype of ID is of course an object. What I want to do is remove any rows in the DF if the ID has a string in it.
I thought this would be as simple as..
df.ID.filter(regex='[\w]*')
but this returns everything, is there a sure fire method for dealing with such things?
Upvotes: 7
Views: 7299
Reputation: 51155
Wen's answer is the correct (and fastest) way to solve this, but to explain why your regular expression doesn't work, you have to understand what \w
means.
\w
matches any word character, which includes [a-zA-Z0-9_]
. So what you're currently matching includes digits, so everything is matched. A valid regular expression approach would be:
df.loc[df.ID.astype(str).str.match(r'\d+')]
ID Course
0 0 Test
1 2 Math
The second issue is your use of filter
. It isn't filtering your ID
row, it is filtering your index. A valid solution using filter
would be as follows:
df.set_index('ID').filter(regex=r'^\d+$', axis=0)
Course
ID
0 Test
2 Math
Upvotes: 5
Reputation: 43504
Another option is to convert the column to string and use str.match
:
print(df[df['ID'].astype(str).str.match("\d+")])
# Course ID
#0 Test 0
#1 Math 2
Your code does not work, because as stated in the docs for pandas.DataFrame.filter
:
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
Upvotes: 5
Reputation: 323236
You can using to_numeric
df[pd.to_numeric(df.ID,errors='coerce').notnull()]
Out[450]:
Course ID
0 Test 0
1 Math 2
Upvotes: 6