Assign column contents to categories

Question

I have a data frame with one column of sub-instances of a larger group, and want to categorize this into a smaller number of groups. How do I do this?

Consider the following sample data:

df = pd.DataFrame({     
'a':np.random.randn(60),
'b':np.random.choice( [5,7,np.nan], 60),
'c':np.random.choice( ['panda', 'elephant', 'python', 'anaconda', 'shark', 'clown fish'], 60),

# some ways to create systematic groups for indexing or groupby
'e':np.tile(   range(20), 3 ),

# a date range and set of random dates
})

I now would want, in a new row, e.g. panda and elephant categorized as mammals, etc.

Ivo · Accepted Answer

The most intuitive would be to create a new series, create a dict and then remap according to it:

mapping_dict = {'panda': 'mammal', 'elephant': 'mammal', 'python': 'snake', 'anaconda': 'snake', 'shark': 'fish', 'clown fish': 'fish'}

c_Series = pd.Series(df['c'])     # create new series
classified_c = c_Series.map(mapping_dict)     # remap new series
if 'c_classified' not in df.columns: df.insert(3, 'c_classified', classified_c) # insert if not in df already (if you want to run the code multiple times

Assign column contents to categories

Answers (2)

Related Questions