Finding which values are being grouped together in a Pandas Dataframe

Question

I have the following function, which takes as input a dataframe and another parameter named "ratio"

def grouper(df, ratio):
    if grouping > 0:
        return df.apply(lambda x: x.mask(x.map(x.value_counts()) < len(df) * ratio, 'other'))
    return df

This function group together those values which appear less frequently.

If my Dataframe were to be something like

>>> df

   Country   Manager
0    Italy     Pippo
1   France     Pluto
2  Germany     Pippo
3    Italy     Pluto
4   France     Pippo
5    Spain     Pluto
6    Italy  Paperino
7   France  Topolino
8   Norway    Minnie

Then using the above-mentioned function I would have:

>>> grouper(df, 0.2)

  Country Manager
0   Italy   Pippo
1  France   Pluto
2   other   Pippo
3   Italy   Pluto
4  France   Pippo
5   other   Pluto
6   Italy   other
7  France   other
8   other   other

Now, I want to find a way to mark down which values have been changed. My desired output is something like this:

{
    "City" : ["Germany", "Spain", "Norway"],
    "Manager" : ["Paperino", "Topolino", "Minnie"]
}

How can I obtain this?

Federico Dorato · Accepted Answer

I managed to do it in the most bloody way possible:

def grouper_cat(df, grouping):
    dictionaries = df.apply(
        lambda x: (
            lambda y=x.value_counts() : (
                lambda z =y[y



Example:

>>> grouper_cat(df, 0.2)

{'Country': ['Norway', 'Germany', 'Spain'],
 'Manager': ['Topolino', 'Paperino', 'Minnie']}


Note:

Compared to @jezrael answer (the new, edited one), my solution is apparently faster

>>> timeit(lambda : grouper_cat(df, 0.2), number=2500)
6.257032366998828

>>> timeit(lambda : grouper_cat_jez(df, 0.2), number=2500)
8.312444757999401

Finding which values are being grouped together in a Pandas Dataframe

Answers (2)

Note:

Related Questions