Reputation: 2097
Let's say I have a dataframe with values like:
Food
----
Turkey
Tomato
Rice
Chicken
Lettuce
And I want to add a category so that it looks like:
Food Category
---- ----
Turkey Meat
Tomato Vegetable
Rice Grain
Chicken Meat
Lettuce Vegetable
But in reality I have ~100 distinct values which I want to categorize into ~10 groups and I want to do it manually.
I have been trying to script them in directly, as opposed to linking up a database or spreadsheet. What I have been trying so far is printed below, along with the error code, but also wondering if there is a better way to achieve this?
Current Code:
df.loc[df.Food.any(
[
'Turkey'
,'Chicken'
]
)
, 'Category'] = 'Meat'
df.loc[df.Food.any(
[
'Tomato'
,'Lettuce'
]
)
, 'Category'] = 'Vegetable'
ERROR:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-41349bcd38a0> in <module>
41 ]
42 )
---> 43 , 'Category'] = 'Meat'
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in logical_func(self, axis, bool_only, skipna, level, **kwargs)
11721 skipna=skipna,
11722 numeric_only=bool_only,
> 11723 filter_type="bool",
11724 )
11725
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4061
4062 if axis is not None:
-> 4063 self._get_axis_number(axis)
4064
4065 if isinstance(delegate, Categorical):
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in _get_axis_number(cls, axis)
400 @classmethod
401 def _get_axis_number(cls, axis):
--> 402 axis = cls._AXIS_ALIASES.get(axis, axis)
403 if is_integer(axis):
404 if axis in cls._AXIS_NAMES:
TypeError: unhashable type: 'list'
Upvotes: 0
Views: 401
Reputation: 9019
I would recommend storing your mapping values in a dictionary with the categories as the keys and the list of options that correspond to that category as the values, like so:
mapping = {'Meat': ['Turkey','Chicken'], 'Vegetable': ['Tomato','Lettuce'], 'Grain': ['Rice']}
Then you can use pd.Series.map
:
df['Category'] = df['Food'].map({i: k for k, v in mapping.items() for i in v})
Yields:
Food Category
0 Turkey Meat
1 Tomato Vegetable
2 Rice Grain
3 Chicken Meat
4 Lettuce Vegetable
Upvotes: 1