Wookeun Lee
Wookeun Lee

Reputation: 463

Conditional delete in pandas dataframe

I want to delete any rows including specific string in dataframe.

I want to delete data rows with abnormal email address (with .jpg)

Here's my code, what's wrong with it?

df = pd.DataFrame({'email':['[email protected]', '[email protected]', '[email protected]', '[email protected]']})

df

             email
0    [email protected]
1    [email protected]
2       [email protected]
3  [email protected]

for i, r in df.iterrows():
    if df.loc[i,'email'][-3:] == 'com':
        df.drop(df.index[i], inplace=True) 

Traceback (most recent call last):

  File "<ipython-input-84-4f12d22e5e4c>", line 2, in <module>
    if df.loc[i,'email'][-3:] == 'com':

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1472, in __getitem__
    return self._getitem_tuple(key)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 870, in _getitem_tuple
    return self._getitem_lowerdim(tup)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 998, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1911, in _getitem_axis
    self._validate_key(key, axis)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1798, in _validate_key
    error()

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1785, in error
    axis=self.obj._get_axis_name(axis)))

KeyError: 'the label [2] is not in the [index]'

Upvotes: 0

Views: 58

Answers (1)

sacuL
sacuL

Reputation: 51425

IIUC, you can do this rather than iterating through your frame with iterrows:

df = df[df.email.str.endswith('.com')]

which returns:

>>> df
             email
0    [email protected]
1    [email protected]
3  [email protected]

Or, for larger dataframes, it's sometimes faster to not use the str methods provided by pandas, but just to do it in a plain list comprehension with python's built in string methods:

df = df[[i.endswith('.com') for i in df.email]]

Upvotes: 1

Related Questions