Reputation: 9355
I have a Pandas DataFrame called df
, containing a column called _text
. I want to remove all rows where the value in _text
is not a string.
Initially I was doing this:
df['_text'] = df['_text'].apply(lambda t: t if isinstance(t, basestring) else '')
But that just sets it to the empty string.
How would I delete any row where the value in the _text
column is not a string?
Thanks!
Upvotes: 1
Views: 2894
Reputation: 863501
You are close, only need return boolean mask from apply
and then use boolean indexing
what return all string
s values (so remove all not string
s like numeric):
df[df['_text'].apply(lambda t: isinstance(t, basestring))]
Or:
df[df['_text'].apply(type) == basestring]
Sample:
df= pd.DataFrame({'_text':[1,4,'ss','']})
print (df)
_text
0 1
1 4
2 ss
3
print (df['_text'].apply(lambda t: isinstance(t, basestring)))
0 False
1 False
2 True
3 True
Name: _text, dtype: bool
#for python 3 it return str, for python 2 basestring
print (df['_text'].apply(type))
0 <class 'int'>
1 <class 'int'>
2 <class 'str'>
3 <class 'str'>
Name: _text, dtype: object
df1 = df[df['_text'].apply(lambda t: isinstance(t, basestring))]
print (df1)
_text
2 ss
3
df1 = df[df['_text'].apply(type) == basestring]
print (df1)
_text
2 ss
3
Upvotes: 3