mvx
mvx

Reputation: 370

Renaming columns in dataframe w.r.t another specific column

BACKGROUND: Large excel mapping file with about 100 columns and 200 rows converted to .csv. Then stored as dataframe. General format of df as below.

Starts with a named column (e.g. Sales) and following two columns need to be renamed. This pattern needs to be repeated for all columns in excel file.

Essentially: Link the subsequent 2 columns to the "parent" one preceding them.

 Sales Unnamed: 2  Unnamed: 3  Validation Unnamed: 5 Unnamed: 6
0       Commented  No comment             Commented  No comment                                   
1     x                                             x                        
2                            x          x                                                
3                x                                             x 

APPROACH FOR SOLUTION: I assume it would be possible to begin with an index (e.g. index of Sales column 1 = x) and then rename the following two columns as (x+1) and (x+2). Then take in the text for the next named column (e.g. Validation) and so on.

I know the rename() function for dataframes.

BUT, not sure how to apply the iteratively for changing column titles.

EXPECTED OUTPUT: Unnamed 2 & 3 changed to Sales_Commented and Sales_No_Comment, respectively.

Similarly Unnamed 5 & 6 change to Validation_Commented and Validation_No_Comment.

Again, repeated for all 100 columns of file.

EDIT: Due to the large number of cols in the file, creating a manual list to store column names is not a viable solution. I have already seen this elsewhere on SO. Also, the amount of columns and departments (Sales, Validation) changes in different excel files with the mapping. So a dynamic solution is required.

  Sales Sales_Commented Sales_No_Comment Validation Validation_Commented Validation_No_Comment
0             Commented       No comment                       Commented            No comment
1     x                                                                x                      
2                                      x                                                      
3                     x                           x                                          x

As a python novice, I considered a possible approach for the solution using the limited knowledge I have, but not sure what this would look like as a workable code.

I would appreciate all help and guidance.

Upvotes: 1

Views: 1044

Answers (1)

aunsid
aunsid

Reputation: 397

1.You need is to make a list with the column names that you would want.
2.Make it a dict with the old column names as the keys and new column name as the values.
3. Use df.rename(columns = your_dictionary).

import numpy as np
import pandas as pd
df = pd.read_excel("name of the excel file",sheet_name = "name of sheet")


print(df.head()) 
Output>>>
    Sales   Unnamed : 2     Unnamed : 3     Validation  Unnamed : 5     Unnamed : 6     Unnamed :7
0   NaN     Commented   No comment  NaN     Comment     No comment  Extra
1   1.0     2   1   1.0     1   1   1
2   3.0     1   1   1.0     1   1   1
3   4.0     3   4   5.0     5   6   6
4   5.0     1   1   1.0     21  3   6

# get new names based on the values of a previous named column
new_column_names = []
counter = 0
for col_name in df.columns:

    if (col_name[:7].strip()=="Unnamed"):

        new_column_names.append(base_name+"_"+df.iloc[0,counter].replace(" ", "_"))
    else:
        base_name = col_name
        new_column_names.append(base_name)

    counter +=1


# convert to dict key pair
dictionary = dict(zip(df.columns.tolist(),new_column_names))

# rename columns
df = df.rename(columns=dictionary)

# drop first column
df = df.iloc[1:].reset_index(drop=True)

print(df.head())
Output>>
    Sales   Sales_Commented     Sales_No_comment    Validation  Validation_Comment  Validation_No_comment   Validation_Extra
0   1.0     2   1   1.0     1   1   1
1   3.0     1   1   1.0     1   1   1
2   4.0     3   4   5.0     5   6   6
3   5.0     1   1   1.0     21  3   6

Upvotes: 2

Related Questions