Tikhon
Tikhon

Reputation: 451

Python DataFrame: rename columns from another DataFrame

I have a large DataFrame (Output_Frame) with columns defined by a list of two strings:

output_frame.columns
Out[14]: 
MultiIndex([('@:M3WRLD', 'AS1GRO'),
            ('@:M3WRLD', 'AS1GRO'),
            ('@:AFM2M2', 'AS1GRO'),
            ('@:AFM2E2', 'AS1GRO'),
        ...
names=['Instrument', 'Field'], length=903)

I would like to rename these columns to a single string contained in another DataFrame (Key):

                                Name Series_Code Datatype_Code  
0  MSCI AC WORLD - CAL FY1 GROWTH SAL    @:M3WRLD        AS1GRO  
1  MSCI AC WORLD - FY2 YOY GROWTH SAL    @:M3WRLD        AS2GRO  
2  MSCI AC WORLD - FY3 YOY GROWTH SAL    @:M3WRLD        AS3GRO  
3  MSCI AC WORLD - CAL FY1 YOY GROWTH    @:M3WRLD        AF1GRO  

To reiterate: the column names of Output_Frame are the frields 'Series_Code' and 'Datatype_Code' from the Key frame. I would like to rename the column names in Output_Frame according to the Name Variable from Key.

For instance the first column in Output_Frame is:

('@:M3WRLD', 'AS1GRO')

And I want it to become:

MSCI AC WORLD - CAL FY1 GROWTH SAL

The following synatx is incorrect but I believe the logic would work.

for col in output_frame.columns:
    for row in key.rows:
        if (key[row, 'Series_Code'] == col[0]) && (key[row, 'Datatype_Code'] == col[1]):
            output_frame.column(col) = key[row, 'Type']

I would be grateful for suggestions in fixing this syntax or for suggesting a better way of accomplishing this - one which doesn't involve iteration. I'm very much a novice in Python and pandas and very grateful for your help.

Upvotes: 4

Views: 372

Answers (1)

Andy L.
Andy L.

Reputation: 25269

You basically want to change 2-level multiindex columns of dataframe output_frame to single level index columns where values are mapped to values in key dataframe. I would say it is a risky logic. You need to make sure every pair-items of multindex exist in key dataframe. However, it is doing able. Assume key dataframe is named df_key. You may do the following:

Sample `output_frame` bases on your provided `multiindex` columns

Instrument @:M3WRLD        @:AFM2M2 @:AFM2E2
Field        AS1GRO AS1GRO   AS1GRO   AS1GRO
0                70     81       74       48

df_key
Out[539]:
                                 Name Series_Code Datatype_Code
0  MSCI AC WORLD - CAL FY1 GROWTH SAL    @:M3WRLD        AS1GRO
1  MSCI AC WORLD - FY2 YOY GROWTH SAL    @:M3WRLD        AS2GRO
2  MSCI AC WORLD - FY3 YOY GROWTH SAL    @:M3WRLD        AS3GRO
3  MSCI AC WORLD - CAL FY1 YOY GROWTH    @:M3WRLD        AF1GRO

#======================================================================
#create a dictionary from `df_key`
d = df_key.set_index(['Series_Code', 'Datatype_Code']).Name.to_dict()

##or
d = {tuple(v): k for k, *v in zip(*map(df_key.get, df_key))}

Out[526]:
{('@:M3WRLD', 'AS1GRO'): 'MSCI AC WORLD - CAL FY1 GROWTH SAL',
 ('@:M3WRLD', 'AS2GRO'): 'MSCI AC WORLD - FY2 YOY GROWTH SAL',
 ('@:M3WRLD', 'AS3GRO'): 'MSCI AC WORLD - FY3 YOY GROWTH SAL',
 ('@:M3WRLD', 'AF1GRO'): 'MSCI AC WORLD - CAL FY1 YOY GROWTH'}

#Use map on `output_dataframe.columns` to flatten and change its values to values from dictionary created from `df_key`
output_frame.columns = output_frame.columns.map(lambda x:  
                                                d.get(tuple(x), f'{x[0]} - {x[1]}'))


Out[534]:
   MSCI AC WORLD - CAL FY1 GROWTH SAL  MSCI AC WORLD - CAL FY1 GROWTH SAL  \
0                                  70                                  81

   @:AFM2M2 - AS1GRO  @:AFM2E2 - AS1GRO
0                 74                 48

Note: as I say you must make sure every pair-items of multindex exist in key dataframe. Your sample output_frame has mutiindex ('@:AFM2M2', 'AS1GRO') and ('@:AFM2E2', 'AS1GRO'), but df_key doesn't have these combos. Therefore, I chose to flatten them to @:AFM2M2 - AS1GRO and @:AFM2E2 - AS1GRO instead of assigning NaN or NoNe to their locations. Second, your sample output_frame has duplicated mutiindex columns, so the mapping also mapped to duplicated values.

Upvotes: 1

Related Questions