Sam CD
Sam CD

Reputation: 2097

Best Way to Manually Assign Categories

Let's say I have a dataframe with values like:


Food
----
Turkey
Tomato
Rice
Chicken
Lettuce

And I want to add a category so that it looks like:

Food        Category
----        ----
Turkey      Meat
Tomato      Vegetable
Rice        Grain
Chicken     Meat
Lettuce     Vegetable

But in reality I have ~100 distinct values which I want to categorize into ~10 groups and I want to do it manually.

I have been trying to script them in directly, as opposed to linking up a database or spreadsheet. What I have been trying so far is printed below, along with the error code, but also wondering if there is a better way to achieve this?

Current Code:

df.loc[df.Food.any(
    [
    'Turkey'
    ,'Chicken'
]
)
         , 'Category'] = 'Meat' 

df.loc[df.Food.any(
    [
    'Tomato'
    ,'Lettuce'
]
)
         , 'Category'] = 'Vegetable' 

ERROR:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-41349bcd38a0> in <module>
     41     ]
     42 )
---> 43          , 'Category'] = 'Meat' 

~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in logical_func(self, axis, bool_only, skipna, level, **kwargs)
  11721             skipna=skipna,
  11722             numeric_only=bool_only,
> 11723             filter_type="bool",
  11724         )
  11725 

~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4061 
   4062         if axis is not None:
-> 4063             self._get_axis_number(axis)
   4064 
   4065         if isinstance(delegate, Categorical):

~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in _get_axis_number(cls, axis)
    400     @classmethod
    401     def _get_axis_number(cls, axis):
--> 402         axis = cls._AXIS_ALIASES.get(axis, axis)
    403         if is_integer(axis):
    404             if axis in cls._AXIS_NAMES:

TypeError: unhashable type: 'list'

Upvotes: 0

Views: 401

Answers (1)

rahlf23
rahlf23

Reputation: 9019

I would recommend storing your mapping values in a dictionary with the categories as the keys and the list of options that correspond to that category as the values, like so:

mapping = {'Meat': ['Turkey','Chicken'], 'Vegetable': ['Tomato','Lettuce'], 'Grain': ['Rice']}

Then you can use pd.Series.map:

df['Category'] = df['Food'].map({i: k for k, v in mapping.items() for i in v})

Yields:

      Food   Category
0   Turkey       Meat
1   Tomato  Vegetable
2     Rice      Grain
3  Chicken       Meat
4  Lettuce  Vegetable

Upvotes: 1

Related Questions