Search columns for a specific set of text and if the text is found enter new a new string of text in a new column pandas

Question

I am trying to look for instances where the words 'gas', diesel, or 'ev' occur in any of my columns in my dataframe (not case sensitive). If any of version of those words are found in the columns I would like to enter an abbreviation for the fuel type in a new column entitled "FUEL".

excerpt of my dataframe

SUMN                 SOUN               MATN   
Light duty vehicle   Diesel Tire wear   Rubber
Heavy duty diesel    Non-catalyst       Diesel
Light duty truck     catalyst           Gasoline
Medium duty vehicle  EV brake wear      brakes

What I'm hoping to output
SUMN                 SOUN               MATN      FUEL
Light duty vehicle   Diesel Tire wear   Rubber    DSL
Heavy duty diesel    Non-catalyst       Diesel    DSL
Light duty truck     catalyst           Gasoline  GAS
Medium duty vehicle  EV brake wear      brakes    ELEC

How do I accomplish this?

I got as far as begin able to look at one column for one type of string, but got stumped on how to move past this point.

df['FUEL'] = df['SUMN'].str.contains('diesel', case=False)

Derek Eden · Accepted Answer

There's definitely a more optimized solution, but hope this puts you on the right path...basically loops through each row, looping through the columns and potential fuel strings and decides which abbr to use:

d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
    df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]

del df['all']

output:

                  SUMN              SOUN      MATN  FUEL
0   Light duty vehicle  Diesel Tire wear    Rubber   DSL
1    Heavy duty diesel      Non-catalyst    Diesel   DSL
2     Light duty truck          catalyst  Gasoline   GAS
3  Medium duty vehicle     EV brake wear    brakes  ELEC

this assume that only one of the fuel types occurs in each row

EDIT: inspired by the other solution:

import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)

same output :)

Search columns for a specific set of text and if the text is found enter new a new string of text in a new column pandas

Answers (2)

Related Questions