How to group by similar rows in pandas

Question

I applied groupby on a dataframe

df.groupby('Category').sum()

after which the result dataframe looks like this

               height      weight 
General  42.849980  157.500553    
GENERAL  49.607315  177.340407 
Genera  56.293531  171.524640  
CategoryA  48.421077  144.251986  
CategoryB  48.421077  144.251986
CategoryC  48.421077  144.251986

I need to group General, GENERAL and Genera in a single row and the result to look like

General    123.849980  300.500553    
CategoryA  48.421077  144.251986  
CategoryB  48.421077  144.251986
CategoryC  48.421077  144.251986

How can I accomplish this ?

Edit: Got the solution with the regex. Is there any way if I need to categorize General, GENERAL, Genera and CategoryA into a single group ?

Piotr · Accepted Answer

Assuming that the category you are grouping by is in the index, you can do:

import re

result = (
    df
    .groupby(df.index.str.replace("genera.*", "General", flags=re.IGNORECASE))
    .sum()
)

Edit: If you don't want to use regex, you can use a different approach with .map. In the example below I assume that your categories are in a column named Category:

mapping = {
    "General": "CategoryA",
    "GENERAL": "CategoryA",
    "Genera": "CategoryA",
}
result = (
    df
    .groupby(df.Category.map(mapping).fillna(df.Category))
    .sum()
)

How to group by similar rows in pandas

Answers (1)

Related Questions