Reputation: 15
I need to rename columns in a dataframe based on a number in the column name.
SLL20100_CH32 SLL201301_CH04 SLL201812_CH02 SLL20123_CH03
Datum/Uhrzeit
2019-07-22 13:45:00 6,170 12,247 866,133 4,728
2019-07-22 16:45:00 NaN 12,251 864,429 4,729
2019-07-22 19:45:00 NaN 12,247 864,837 4,730
2019-07-22 22:45:00 NaN 12,251 865,077 4,731
2019-07-23 01:45:00 NaN 12,255 865,640 4,732
I created a dict
key = {"CH01": "Temp",
"CH02": "LF",
"CH03": "WS",
"CH04": "Sal",
"CH05": "TDS",
"CH06": "Dichte",
"CH32": "Batterie",
}
and try to rename the columns by data.rename(key, axis='columns', inplace=True)
but that only works if I remove the SLL-Number before. Since they are different lengths, I would have to remove them manually, which I don't want to do.
I would like to have a solution where I can rename the whole column based on the substring and the dict so that it looks like this:
Batterie Temp LF WS
Datum/Uhrzeit
2019-07-22 13:45:00 6,170 12,247 866,133 4,728
2019-07-22 16:45:00 NaN 12,251 864,429 4,729
2019-07-22 19:45:00 NaN 12,247 864,837 4,730
2019-07-22 22:45:00 NaN 12,251 865,077 4,731
2019-07-23 01:45:00 NaN 12,255 865,640 4,732
Could someone please help me? I have searched a lot but could not find a proper solution yet.
Upvotes: 1
Views: 1857
Reputation: 46908
If it is always the string after _
, just substitute the string out and find the value in the dictionary:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.uniform(0,1,(5,4)),
columns=['SLL20100_CH32','SLL201301_CH04','SLL201812_CH02','SLL20123_CH03'])
new_names = [key.get(i) for i in df.columns.str.replace("[^ ]*_","")]
new_names
['Batterie', 'Sal', 'LF', 'WS']
df.columns = new_names
Upvotes: 3
Reputation: 1336
assuming all values will have a match you could utilize this:
vals = []
for key_val in key.keys():
for column in df.columns:
if key_val in column:
vals.append(key[key_val])
df.columns = vals
else:
vals = dict()
for key_val in key.keys():
for column in df.columns:
if key_val in column:
vals[column] = key[key_val]
df = df.rename(columns = vals)
Upvotes: 1
Reputation: 3639
You could use regular expressions to parse the name of the column and extract the key used in your dictionary:
import re
regex = re.compile(r'^(?:[A-Z0-9]+)\_(CH\d+)$')
df.columns = [key[regex.match(c).groups()[0]] for c in df.columns]
I assume that your column names have the following structure:
<Capitol letters and digits>_CH<one or more digits>
The following code shows an example with the data you provided:
import re
import numpy as np
import pandas as pd
df = pd.DataFrame(
data=np.random.uniform(0,1,(5,4)),
columns=['SLL20100_CH32','SLL201301_CH04','SLL201812_CH02','SLL20123_CH03']
)
key = {"CH01": "Temp",
"CH02": "LF",
"CH03": "WS",
"CH04": "Sal",
"CH05": "TDS",
"CH06": "Dichte",
"CH32": "Batterie",
}
regex = re.compile(r'^(?:[A-Z0-9]+)\_(CH\d+)$')
df.columns = [key[regex.match(c).groups()[0]] for c in df.columns]
You get:
Batterie Sal LF WS
0 0.187421 0.865569 0.719451 0.589747
1 0.214085 0.214905 0.101525 0.776853
2 0.588324 0.940583 0.368402 0.822162
3 0.776384 0.634419 0.482400 0.808173
4 0.397829 0.256699 0.718301 0.334272
P.S.: Thanks to @StupidWolf for the example of the DataFrame.
Upvotes: 2