ACAYA
ACAYA

Reputation: 55

Search columns for a specific set of text and if the text is found enter new a new string of text in a new column pandas

I am trying to look for instances where the words 'gas', diesel, or 'ev' occur in any of my columns in my dataframe (not case sensitive). If any of version of those words are found in the columns I would like to enter an abbreviation for the fuel type in a new column entitled "FUEL".

excerpt of my dataframe

SUMN                 SOUN               MATN   
Light duty vehicle   Diesel Tire wear   Rubber
Heavy duty diesel    Non-catalyst       Diesel
Light duty truck     catalyst           Gasoline
Medium duty vehicle  EV brake wear      brakes

What I'm hoping to output
SUMN                 SOUN               MATN      FUEL
Light duty vehicle   Diesel Tire wear   Rubber    DSL
Heavy duty diesel    Non-catalyst       Diesel    DSL
Light duty truck     catalyst           Gasoline  GAS
Medium duty vehicle  EV brake wear      brakes    ELEC

How do I accomplish this?

I got as far as begin able to look at one column for one type of string, but got stumped on how to move past this point.

df['FUEL'] = df['SUMN'].str.contains('diesel', case=False)

Upvotes: 0

Views: 587

Answers (2)

Derek Eden
Derek Eden

Reputation: 4618

There's definitely a more optimized solution, but hope this puts you on the right path...basically loops through each row, looping through the columns and potential fuel strings and decides which abbr to use:

d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
    df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]

del df['all']

output:

                  SUMN              SOUN      MATN  FUEL
0   Light duty vehicle  Diesel Tire wear    Rubber   DSL
1    Heavy duty diesel      Non-catalyst    Diesel   DSL
2     Light duty truck          catalyst  Gasoline   GAS
3  Medium duty vehicle     EV brake wear    brakes  ELEC

this assume that only one of the fuel types occurs in each row

EDIT: inspired by the other solution:

import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)

same output :)

Upvotes: 1

Erfan
Erfan

Reputation: 42906

Here's a way using apply with str.contains to check all columns for each word. And finally we map the words to the correct ones, for example ev -> ELECT.

Note that I use ?i in my regex, which makes it not case sensitive:

words = ['gas', 'diesel', 'ev']
mapping = {'gas':'GAS', 'diesel':'DSL', 'ev':'ELEC'}

for word in words:
    m = df.apply(lambda x: x.str.contains(f'(?i)({word})')).any(axis=1)
    df.loc[m, 'FUEL'] = mapping[word]

Output

                  SUMN              SOUN      MATN  FUEL
0   Light duty vehicle  Diesel Tire wear    Rubber   DSL
1    Heavy duty diesel      Non-catalyst    Diesel   DSL
2     Light duty truck          catalyst  Gasoline   GAS
3  Medium duty vehicle     EV brake wear    brakes  ELEC

Upvotes: 3

Related Questions