vossa
vossa

Reputation: 15

Rename columns based on substring and dict in python

I need to rename columns in a dataframe based on a number in the column name.

                    SLL20100_CH32 SLL201301_CH04 SLL201812_CH02 SLL20123_CH03
Datum/Uhrzeit                                                                 
2019-07-22 13:45:00         6,170        12,247       866,133         4,728   
2019-07-22 16:45:00           NaN        12,251       864,429         4,729   
2019-07-22 19:45:00           NaN        12,247       864,837         4,730   
2019-07-22 22:45:00           NaN        12,251       865,077         4,731   
2019-07-23 01:45:00           NaN        12,255       865,640         4,732   

I created a dict

key = {"CH01": "Temp",
       "CH02": "LF",
       "CH03": "WS",
       "CH04": "Sal",
       "CH05": "TDS",
       "CH06": "Dichte",
       "CH32": "Batterie",
       }

and try to rename the columns by data.rename(key, axis='columns', inplace=True) but that only works if I remove the SLL-Number before. Since they are different lengths, I would have to remove them manually, which I don't want to do.

I would like to have a solution where I can rename the whole column based on the substring and the dict so that it looks like this:

                    Batterie    Temp       LF     WS
Datum/Uhrzeit                                                                
2019-07-22 13:45:00    6,170  12,247  866,133  4,728
2019-07-22 16:45:00      NaN  12,251  864,429  4,729
2019-07-22 19:45:00      NaN  12,247  864,837  4,730
2019-07-22 22:45:00      NaN  12,251  865,077  4,731
2019-07-23 01:45:00      NaN  12,255  865,640  4,732

Could someone please help me? I have searched a lot but could not find a proper solution yet.

Upvotes: 1

Views: 1857

Answers (3)

StupidWolf
StupidWolf

Reputation: 46908

If it is always the string after _ , just substitute the string out and find the value in the dictionary:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.uniform(0,1,(5,4)),
columns=['SLL20100_CH32','SLL201301_CH04','SLL201812_CH02','SLL20123_CH03'])

new_names = [key.get(i) for i in df.columns.str.replace("[^ ]*_","")]

new_names
['Batterie', 'Sal', 'LF', 'WS']

df.columns = new_names

Upvotes: 3

WolVes
WolVes

Reputation: 1336

assuming all values will have a match you could utilize this:

vals = []
for key_val in key.keys():
    for column in df.columns:
        if key_val in column:
            vals.append(key[key_val])
df.columns = vals

else:

vals = dict()
for key_val in key.keys():
    for column in df.columns:
        if key_val in column:
            vals[column] = key[key_val]
df = df.rename(columns = vals)

Upvotes: 1

PieCot
PieCot

Reputation: 3639

You could use regular expressions to parse the name of the column and extract the key used in your dictionary:

import re

regex = re.compile(r'^(?:[A-Z0-9]+)\_(CH\d+)$')
df.columns = [key[regex.match(c).groups()[0]] for c in df.columns]

I assume that your column names have the following structure:

<Capitol letters and digits>_CH<one or more digits>

The following code shows an example with the data you provided:

import re
import numpy as np
import pandas as pd

df = pd.DataFrame(
    data=np.random.uniform(0,1,(5,4)),
    columns=['SLL20100_CH32','SLL201301_CH04','SLL201812_CH02','SLL20123_CH03']
)
key = {"CH01": "Temp",
       "CH02": "LF",
       "CH03": "WS",
       "CH04": "Sal",
       "CH05": "TDS",
       "CH06": "Dichte",
       "CH32": "Batterie",
       }

regex = re.compile(r'^(?:[A-Z0-9]+)\_(CH\d+)$')
df.columns = [key[regex.match(c).groups()[0]] for c in df.columns]

You get:

   Batterie       Sal        LF        WS
0  0.187421  0.865569  0.719451  0.589747
1  0.214085  0.214905  0.101525  0.776853
2  0.588324  0.940583  0.368402  0.822162
3  0.776384  0.634419  0.482400  0.808173
4  0.397829  0.256699  0.718301  0.334272

P.S.: Thanks to @StupidWolf for the example of the DataFrame.

Upvotes: 2

Related Questions