Reputation: 784
I have the following function, which takes as input a dataframe and another parameter named "ratio"
def grouper(df, ratio):
if grouping > 0:
return df.apply(lambda x: x.mask(x.map(x.value_counts()) < len(df) * ratio, 'other'))
return df
This function group together those values which appear less frequently.
If my Dataframe were to be something like
>>> df
Country Manager
0 Italy Pippo
1 France Pluto
2 Germany Pippo
3 Italy Pluto
4 France Pippo
5 Spain Pluto
6 Italy Paperino
7 France Topolino
8 Norway Minnie
Then using the above-mentioned function I would have:
>>> grouper(df, 0.2)
Country Manager
0 Italy Pippo
1 France Pluto
2 other Pippo
3 Italy Pluto
4 France Pippo
5 other Pluto
6 Italy other
7 France other
8 other other
Now, I want to find a way to mark down which values have been changed. My desired output is something like this:
{
"City" : ["Germany", "Spain", "Norway"],
"Manager" : ["Paperino", "Topolino", "Minnie"]
}
How can I obtain this?
Upvotes: 0
Views: 40
Reputation: 863226
Use dictioanry comprehension with filtering each column:
def grouper(df, ratio):
if ratio > 0:
d={x:df.loc[df[x].map(df[x].value_counts()) < len(df) * ratio, x].unique().tolist()
for x in df.columns}
return d
return df
df = grouper(df, 0.2)
print (df)
{'Country': ['Germany', 'Spain', 'Norway'], 'Manager': ['Paperino', 'Topolino', 'Minnie']}
Upvotes: 1
Reputation: 784
I managed to do it in the most bloody way possible:
def grouper_cat(df, grouping):
dictionaries = df.apply(
lambda x: (
lambda y=x.value_counts() : (
lambda z =y[y<len(df)*grouping] : {z.name:(z).index.tolist()}
)()
)()
).values
result = {}
for d in dictionaries:
result.update(d)
return result
Example:
>>> grouper_cat(df, 0.2)
{'Country': ['Norway', 'Germany', 'Spain'],
'Manager': ['Topolino', 'Paperino', 'Minnie']}
Compared to @jezrael answer (the new, edited one), my solution is apparently faster
>>> timeit(lambda : grouper_cat(df, 0.2), number=2500)
6.257032366998828
>>> timeit(lambda : grouper_cat_jez(df, 0.2), number=2500)
8.312444757999401
Upvotes: 0