Reputation: 34056
I am pretty new to Pandas and I have a use-case like below: I have a Dataframe like:
print (df)
date sector symbol val1 val2
0 2000-01-31 IT A 82.10 19.2
1 2000-01-31 IT AA 28.00 20.3
2 2000-01-31 Sports AB 32.22 1.2
3 2000-02-27 Industry c 16.60 3.5
4 2000-02-27 Industry cc 96.24 2.6
"date", "sector" and "symbol" are my key columns. I want to group on "date" and rank the columns "val1" and "val2" in descending order and create a new dataframe with key columns and ranks for both (val1 and val2) instead of the actual values.
I'm using a for loop to iterate over the list of "date" and I use the pd.rank() function to find the rank. The problem here is that, I'm not able to properly Append rows to the new_df(the one with ranks). Below is my code:
new_df = df.iloc[:,0:3] # This holds only key cols(date,sector,symbol)
periods = np.sort(df['date'].dropna().unique())
grped=df.groupby('date')
for col in ['val1', 'val2']:
new_df['{}_rnk'.format(col)] = "" #Creating a blank column for rank
for dt in periods:
t = pd.DataFrame()
one = grped.get_group(dt)
t = one[col].rank(ascending=flag,method='average')
new_df['{}_rnk'.format(col)] = new_df['{}_rnk'.format(col)].append(t)
This is not giving me the right output. Can someone suggest what am I doing wrong or can tell a better approach?
Upvotes: 1
Views: 975
Reputation: 862661
I believe here is possible use GroupBy.rank
with add_suffix
and append to original DataFrame
by join
:
#set like need
flag=True
df1 = df.groupby('date')['val1', 'val2'].rank(ascending=flag,method='average')
df = df.join(df1.add_suffix('_rnk'))
print (df)
date sector symbol val1 val2 val1_rnk val2_rnk
0 2000-01-31 IT A 82.10 19.2 3.0 2.0
1 2000-01-31 IT AA 28.00 20.3 1.0 3.0
2 2000-01-31 Sports AB 32.22 1.2 2.0 1.0
3 2000-02-27 Industry c 16.60 3.5 1.0 2.0
4 2000-02-27 Industry cc 96.24 2.6 2.0 1.0
Another solution with custom function with specified columns for descending and ascending rank:
mapp = pd.DataFrame({'column_name':['val1','val2'], 'direction':['Asc','Desc']})
print(mapp)
column_name direction
0 val1 Asc
1 val2 Desc
asc_cols = [k for k, v in mapp.set_index('column_name')['direction'].items() if v == 'Asc']
desc_cols = [k for k, v in mapp.set_index('column_name')['direction'].items() if v == 'Desc']
print(asc_cols)
['val1']
print (desc_cols)
['val2']
Or:
asc_cols = []
desc_cols = []
for k, v in mapp.set_index('column_name')['direction'].items():
if v == 'Desc':
asc_cols.append(k)
elif v == 'Asc':
desc_cols.append(k)
print(asc_cols)
print (desc_cols)
def func(x):
x[asc_cols] = x[asc_cols].rank(ascending=True,method='average').add_suffix('_rnk')
x[desc_cols] = x[desc_cols].rank(ascending=False,method='average').add_suffix('_rnk')
return x
df1 = df.groupby('date')['val1', 'val2'].apply(func)
print (df1)
val1 val2
0 3.0 2.0
1 1.0 1.0
2 2.0 3.0
3 1.0 1.0
4 2.0 2.0
Upvotes: 2