mcslane
mcslane

Reputation: 335

Pandas - Selecting rows in a DataFrame using String equality

I am trying to get all rows from the DataFrame contributors where occupation is retired, like so:

mask = (contributors.contbr_occupation.str == 'RETIRED')
print(contributors[mask])

However, I get the following stack trace:

Traceback (most recent call last):
  File "C:\Users\Me\Anaconda3\envs\pandas\lib\site-packages\pandas\indexes\base.py", line 2134, in get_loc
    return self._engine.get_loc(key)
  File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
  File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
  File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: False

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "census_attack.py", line 27, in <module>
    print(contributors[mask])
  File "C:\Users\Me\Anaconda3\envs\pandas\lib\site-packages\pandas\core\frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\Me\Anaconda3\envs\pandas\lib\site-packages\pandas\core\frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\Me\Anaconda3\envs\pandas\lib\site-packages\pandas\core\generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\Me\Anaconda3\envs\pandas\lib\site-packages\pandas\core\internals.py", line 3543, in get
    loc = self.items.get_loc(item)
  File "C:\Users\Me\Anaconda3\envs\pandas\lib\site-packages\pandas\indexes\base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
  File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)
  File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)
KeyError: False

How can I do this?

Upvotes: 3

Views: 3023

Answers (2)

piRSquared
piRSquared

Reputation: 294218

You could use query

contributors.query('contbr_occupation == "RETIRED"')

Upvotes: 2

miradulo
miradulo

Reputation: 29690

If you are just performing a real equality check (not containment or anything like that), don't use the str accessor - you don't need it.

mask = (contributors.contbr_occupation == 'RETIRED')

Example

>>> df

  strings
0     abc
1     def
2     ghi
3     abc

>>> df[df.strings == 'abc']

  strings
0     abc
3     abc

If you do need some logic condition like containment, actually call a string method on the str accessor, for example with str.contains

mask = (contributors.contbr_occupation.str.contains('RETIRED'))

Upvotes: 2

Related Questions