Reputation: 37
I want to rank normalise all variables in pandas DataFrame to range [0,1]. However, I can now only perform this on one variable (var1). Do you know how to iterate this code over many variables in dataframe (from var1 to varn)? The code below is what I have done for one variable var1 in small example:
import pandas as pd
#Create dataframe
data = {'year': [1990,1990,1991,1991,1991],
'var1': [10,20,45,40,55]}
df= pd.DataFrame(data)
obsperyearvar1 = df.groupby('year')['var1'].transform('size')
df['rankvar1'] = df.groupby('year')['var1'].rank().div(obsperyearvar1)
print(df)
year var1 rankvar1
0 1990 10 0.500000
1 1990 20 1.000000
2 1991 45 0.666667
3 1991 40 0.333333
4 1991 55 1.000000
Thank you in advance!
Upvotes: 0
Views: 98
Reputation: 14949
IIUC, you can try:
df = pd.concat([df, df.groupby('year').apply(pd.Series.rank, pct=True).filter(
like='var'pct=True).add_prefix('rank')], axis=1axis =1)
Complete example:
import pandas as pd
# Create dataframe
data = {'year': [1990, 1990, 1991, 1991, 1991],
'var1': [10, 20, 45, 40, 55],
'var2': [10, 1, 5, 40, 5]}
df = pd.DataFrame(data)
df = pd.concat([df, df.groupby('year').apply(pd.Series.rank, pct=True).filter(
like='var'pct=True).add_prefix('rank')], axis=1axis =1)
year var1 var2 rankvar1 rankvar2
0 1990 10 10 0.500000 1.0
1 1990 20 1 1.000000 0.5
2 1991 45 5 0.666667 0.5
3 1991 40 40 0.333333 1.0
4 1991 55 5 1.000000 0.5
Upvotes: 1
Reputation: 35636
Another option via join
+ groupby rank
:
new_df = df.join(df.groupby('year').rank(pct=True).add_prefix('rank'))
new_df
:
year var1 rankvar1
0 1990 10 0.500000
1 1990 20 1.000000
2 1991 45 0.666667
3 1991 40 0.333333
4 1991 55 1.000000
Sample Data Thanks to @Nk03:
import pandas as pd
# Create dataframe
data = {'year': [1990, 1990, 1991, 1991, 1991],
'var1': [10, 20, 45, 40, 55],
'var2': [10, 1, 5, 40, 5]}
df = pd.DataFrame(data)
new_df = df.join(df.groupby('year').rank(pct=True).add_prefix('rank'))
print(new_df)
new_df
:
year var1 var2 rankvar1 rankvar2
0 1990 10 10 0.500000 1.0
1 1990 20 1 1.000000 0.5
2 1991 45 5 0.666667 0.5
3 1991 40 40 0.333333 1.0
4 1991 55 5 1.000000 0.5
Upvotes: 1