Reputation:
I have a data frame with one column of sub-instances of a larger group, and want to categorize this into a smaller number of groups. How do I do this?
Consider the following sample data:
df = pd.DataFrame({
'a':np.random.randn(60),
'b':np.random.choice( [5,7,np.nan], 60),
'c':np.random.choice( ['panda', 'elephant', 'python', 'anaconda', 'shark', 'clown fish'], 60),
# some ways to create systematic groups for indexing or groupby
'e':np.tile( range(20), 3 ),
# a date range and set of random dates
})
I now would want, in a new row, e.g. panda and elephant categorized as mammals, etc.
Upvotes: 1
Views: 58
Reputation: 862661
I think need map
with fillna
for replace NaN
s if non match values:
#borrowed dict from Ivo's answer
mapping_dict = {'panda': 'mammal', 'elephant': 'mammal',
'python': 'snake', 'anaconda': 'snake',
'shark': 'fish', 'clown fish': 'fish'}
df['d'] = df['c'].map(mapping_dict).fillna('not_matched')
Also if change format of dictionary is possible generate final dictioanry with swap keys with values:
d = {'mammal':['panda','elephant'],
'snake':['python','anaconda'],
'fish':['shark','clown fish']}
mapping_dict = {k: oldk for oldk, oldv in d.items() for k in oldv}
df['d'] = df['c'].map(mapping_dict).fillna('not_matched')
Upvotes: 0
Reputation: 4200
The most intuitive would be to create a new series, create a dict and then remap according to it:
mapping_dict = {'panda': 'mammal', 'elephant': 'mammal', 'python': 'snake', 'anaconda': 'snake', 'shark': 'fish', 'clown fish': 'fish'}
c_Series = pd.Series(df['c']) # create new series
classified_c = c_Series.map(mapping_dict) # remap new series
if 'c_classified' not in df.columns: df.insert(3, 'c_classified', classified_c) # insert if not in df already (if you want to run the code multiple times
Upvotes: 1