Jonathan
Jonathan

Reputation: 125

How to apply minmax scaler according to different dataframe

i have a dataframe as below:

import pandas as pd

df = pd.DataFrame({

'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales'   : [10,20,30,40,100,10,30,50,60,100]

})

df.head(15)

current method: normalize according to a single category in df, manually

from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

df_fruits = df[df['category'] == "fruits"]
df_fruits['sales'] = scaler.fit_transform(df_fruits[['sales']])
df_fruits.head()
df_fruits = pd.to_csv('minmax/output/category-{}-minmax.csv'.format('XX'))

questions:
- how to loop through accordingly to all the category in df
- then how to export the csv file accordingly with category name in it

thanks a lot

Upvotes: 0

Views: 394

Answers (2)

user2755526
user2755526

Reputation: 197

Looks like you have to perform some function gymnastics for this to work.

Your dataframe.

import pandas as pd

df = pd.DataFrame({

'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales'   : [10,20,30,40,100,10,30,50,60,100]

})
def minmax_wrapper(x):
    from sklearn.preprocessing import MinMaxScaler
    scaler = MinMaxScaler()
    return pd.Series(scaler.fit_transform(x.values.reshape(-1,1)).flatten())

Now apply it to your grouped dataframe.

df['scaled_sales'] = df.groupby('category')['sales'].transform(minmax_wrapper)

Voila!

You can iterate through your groups using

# I believe this should work haven't tried it out
for category, grouped in df.groupby('category'):
    grouped.to_csv(f"minmax/output/category-{category}-minmax.csv")

Upvotes: 0

Henry Yik
Henry Yik

Reputation: 22493

Use Series.unique:

for i in df["category"].unique():
    cat = df[df['category'] == i]
    cat['sales'] = scaler.fit_transform(cat[['sales']])
    cat.to_csv('minmax/output/category-{}-minmax.csv'.format(i))

Upvotes: 1

Related Questions