jovicbg
jovicbg

Reputation: 1553

Use yaml file to rename Pandas dataframe columns

A have heard somewhere that is possible to pass a yaml file to python script to rename columns in pandas dataframe. But I have no idea how to do that and not sure if I found anything useful.

For example, yaml:

mappings:
    new_column_name1: [old_name_1, old_name_2, old_name_3, old_name_4], 
    new_columns_name2: [old_name_5, old_name_6, old_name_7, old_name_8]

df:

old_name1  old_name_6
    1           4
    3           6
    6           31

Is it possible to use yaml file like this to rename columns (each column name appear in list [old_name_1, old_name_2, old_name_3, old_name_4] rename to new_column_name1) and what is the best way for that?

I know I didn't provide any code I have tried but I really have no idea. Also, any other suggestion about good practice for renaming a huge number of columns in multiple dataframes is welcomed.

Upvotes: 1

Views: 1107

Answers (1)

Ami Tavory
Ami Tavory

Reputation: 76386

Your example doesn't seem to be legal YAML. Rather, it should be something like:

mappings:
    new_column_name1: 
        - old_name_1 
        - old_name_2 
        - old_name_3 
        - old_name_4

and so on.

In any case, if you install pyaml, you can use something like:

from pyaml import yaml

d = yaml.load(open('foo.yaml', 'r'))['mappings']
cols = []
for c in df.columns:
    cols.append(c)
    for k, v in d.items():
        if c in v:
            cols[-1] = k
            break
df.columns = cols

Upvotes: 1

Related Questions