Reputation: 125
i have a dataframe as below:
import pandas as pd
df = pd.DataFrame({
'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales' : [10,20,30,40,100,10,30,50,60,100]
})
df.head(15)
current method: normalize according to a single category in df, manually
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_fruits = df[df['category'] == "fruits"]
df_fruits['sales'] = scaler.fit_transform(df_fruits[['sales']])
df_fruits.head()
df_fruits = pd.to_csv('minmax/output/category-{}-minmax.csv'.format('XX'))
questions:
- how to loop through accordingly to all the category in df
- then how to export the csv file accordingly with category name in it
thanks a lot
Upvotes: 0
Views: 394
Reputation: 197
Looks like you have to perform some function gymnastics for this to work.
Your dataframe
.
import pandas as pd
df = pd.DataFrame({
'category': ['fruits','fruits','fruits','fruits','fruits','vegetables','vegetables','vegetables','vegetables','vegetables'],
'product' : ['apple','orange','durian','coconut','grape','cabbage','carrot','spinach','grass','potato'],
'sales' : [10,20,30,40,100,10,30,50,60,100]
})
def minmax_wrapper(x):
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
return pd.Series(scaler.fit_transform(x.values.reshape(-1,1)).flatten())
Now apply it to your grouped dataframe.
df['scaled_sales'] = df.groupby('category')['sales'].transform(minmax_wrapper)
Voila!
You can iterate through your groups using
# I believe this should work haven't tried it out
for category, grouped in df.groupby('category'):
grouped.to_csv(f"minmax/output/category-{category}-minmax.csv")
Upvotes: 0
Reputation: 22493
Use Series.unique
:
for i in df["category"].unique():
cat = df[df['category'] == i]
cat['sales'] = scaler.fit_transform(cat[['sales']])
cat.to_csv('minmax/output/category-{}-minmax.csv'.format(i))
Upvotes: 1