anon_swe
anon_swe

Reputation: 9355

Pandas DataFrame: Remove row satisfying certain condition

I have a Pandas DataFrame called df, containing a column called _text. I want to remove all rows where the value in _text is not a string.

Initially I was doing this:

df['_text'] = df['_text'].apply(lambda t: t if isinstance(t, basestring) else '')

But that just sets it to the empty string.

How would I delete any row where the value in the _text column is not a string?

Thanks!

Upvotes: 1

Views: 2894

Answers (1)

jezrael
jezrael

Reputation: 863501

You are close, only need return boolean mask from apply and then use boolean indexing what return all strings values (so remove all not strings like numeric):

df[df['_text'].apply(lambda t: isinstance(t, basestring))]

Or:

df[df['_text'].apply(type) == basestring]

Sample:

df= pd.DataFrame({'_text':[1,4,'ss','']})
print (df)
  _text
0     1
1     4
2    ss
3     

print (df['_text'].apply(lambda t: isinstance(t, basestring)))
0    False
1    False
2     True
3     True
Name: _text, dtype: bool

#for python 3 it return str,  for python 2 basestring
print (df['_text'].apply(type))
0    <class 'int'>
1    <class 'int'>
2    <class 'str'>
3    <class 'str'>
Name: _text, dtype: object

df1 = df[df['_text'].apply(lambda t: isinstance(t, basestring))]
print (df1)
  _text
2    ss
3      

df1 = df[df['_text'].apply(type) == basestring]
print (df1)
  _text
2    ss
3      

Upvotes: 3

Related Questions