The Great
The Great

Reputation: 7733

Elegant and efficient way to replace multiple terms in a pandas column

I would like to replace multiple values in the data frame column as shown below

df['label'] = ['Sodium', 'Bicarbonate', 'White Blood Cells', 'Hemoglobin',
       'Glucose', 'Lactate', 'pH', 'Potassium, Whole Blood',
       'Sodium, Whole Blood', 'Lactate Dehydrogenase (LD)',
       'Bilirubin, Direct', 'Alkaline Phosphatase',
       'Alanine Aminotransferase (ALT)',
       'Asparate Aminotransferase (AST)', 'Potassium', 'Phosphate',
       'Creatinine', 'C-Reactive Protein', 'pCO2',
       'Calculated Bicarbonate, Whole Blood', 'Bilirubin, Total',
       'Albumin', 'Bilirubin, Indirect', 'Urine Volume', 'WBC Count',
       'Urine Volume, Total', 'Phosphate, Body Fluid']

In the below code, am trying to replace Sodium and Sodium, Whole Blood with just Sodium.

Similarly, I do the same for the rest of the measurements as well

df['label'] = df['label'].replace(dict.fromkeys(['Sodium','Sodium, Whole Blood'], 'Sodium'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Bilirubin, Indirect'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bilirubin, Direct','Bilirubin, Total','Calculated Bicarbonate, Whole Blood'], 'Bilirubin'))
df['label'] = df['label'].replace(dict.fromkeys(['Urine Volume, Total','Urine Volume'], 'Urine Volume'))
df['label'] = df['label'].replace(dict.fromkeys(['White Blood Cells','WBC Count'], 'WBC'))
df['label'] = df['label'].replace(dict.fromkeys(['Potassium, Whole Blood','Potassium'], 'Potassium'))
df['label'] = df['label'].replace(dict.fromkeys(['Phosphate','Phosphate, Body Fluid'], 'Phosphate'))

Though the above code works perfectly fine, is there any other efficient way to replace efficiently instead of repeating the same line of code multiple times?

Upvotes: 2

Views: 99

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

One way is to create the big dictionary and replace once:

# add more of your stuff here
lst = [(['Sodium','Sodium, Whole Blood'], 'Sodium'),
       (['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate')
      ]

repl_dict = {}
for x,y in lst:
    repl_dict.update(dict.fromkeys(x,y))

df['label'] = df['label'].replace(repl_dict)

Upvotes: 3

Related Questions