Mayank Porwal
Mayank Porwal

Reputation: 34056

Appending rows in a specific column of a Pandas DF in a loop

I am pretty new to Pandas and I have a use-case like below: I have a Dataframe like:

print (df)

         date    sector symbol   val1  val2
0  2000-01-31        IT      A  82.10  19.2
1  2000-01-31        IT     AA  28.00  20.3
2  2000-01-31    Sports     AB  32.22   1.2
3  2000-02-27  Industry      c  16.60   3.5
4  2000-02-27  Industry     cc  96.24   2.6

"date", "sector" and "symbol" are my key columns. I want to group on "date" and rank the columns "val1" and "val2" in descending order and create a new dataframe with key columns and ranks for both (val1 and val2) instead of the actual values.

I'm using a for loop to iterate over the list of "date" and I use the pd.rank() function to find the rank. The problem here is that, I'm not able to properly Append rows to the new_df(the one with ranks). Below is my code:

    new_df = df.iloc[:,0:3] # This holds only key cols(date,sector,symbol)
    periods = np.sort(df['date'].dropna().unique())
    grped=df.groupby('date')
    for col in ['val1', 'val2']:
        new_df['{}_rnk'.format(col)] = "" #Creating a blank column for rank

        for dt in periods:
            t = pd.DataFrame()
            one = grped.get_group(dt)
            t = one[col].rank(ascending=flag,method='average')
            new_df['{}_rnk'.format(col)] = new_df['{}_rnk'.format(col)].append(t)

This is not giving me the right output. Can someone suggest what am I doing wrong or can tell a better approach?

Upvotes: 1

Views: 975

Answers (1)

jezrael
jezrael

Reputation: 862661

I believe here is possible use GroupBy.rank with add_suffix and append to original DataFrame by join:

#set like need
flag=True
df1 = df.groupby('date')['val1', 'val2'].rank(ascending=flag,method='average')
df = df.join(df1.add_suffix('_rnk'))

print (df)
         date    sector symbol   val1  val2  val1_rnk  val2_rnk
0  2000-01-31        IT      A  82.10  19.2       3.0       2.0
1  2000-01-31        IT     AA  28.00  20.3       1.0       3.0
2  2000-01-31    Sports     AB  32.22   1.2       2.0       1.0
3  2000-02-27  Industry      c  16.60   3.5       1.0       2.0
4  2000-02-27  Industry     cc  96.24   2.6       2.0       1.0

Another solution with custom function with specified columns for descending and ascending rank:

mapp = pd.DataFrame({'column_name':['val1','val2'], 'direction':['Asc','Desc']})
print(mapp)
  column_name direction
0        val1       Asc
1        val2      Desc

asc_cols = [k for k, v in mapp.set_index('column_name')['direction'].items() if v == 'Asc']
desc_cols = [k for k, v in mapp.set_index('column_name')['direction'].items() if v == 'Desc']
print(asc_cols)
['val1']
print (desc_cols)
['val2']

Or:

asc_cols = []
desc_cols = []

for k, v in mapp.set_index('column_name')['direction'].items():
    if v == 'Desc':
        asc_cols.append(k)
    elif v == 'Asc':
        desc_cols.append(k)  

print(asc_cols)
print (desc_cols)

def func(x):
    x[asc_cols] = x[asc_cols].rank(ascending=True,method='average').add_suffix('_rnk')
    x[desc_cols] = x[desc_cols].rank(ascending=False,method='average').add_suffix('_rnk')
    return x

df1 = df.groupby('date')['val1', 'val2'].apply(func)
print (df1)
   val1  val2
0   3.0   2.0
1   1.0   1.0
2   2.0   3.0
3   1.0   1.0
4   2.0   2.0

Upvotes: 2

Related Questions