Reputation: 55
I am trying to look for instances where the words 'gas', diesel, or 'ev' occur in any of my columns in my dataframe (not case sensitive). If any of version of those words are found in the columns I would like to enter an abbreviation for the fuel type in a new column entitled "FUEL".
excerpt of my dataframe
SUMN SOUN MATN
Light duty vehicle Diesel Tire wear Rubber
Heavy duty diesel Non-catalyst Diesel
Light duty truck catalyst Gasoline
Medium duty vehicle EV brake wear brakes
What I'm hoping to output
SUMN SOUN MATN FUEL
Light duty vehicle Diesel Tire wear Rubber DSL
Heavy duty diesel Non-catalyst Diesel DSL
Light duty truck catalyst Gasoline GAS
Medium duty vehicle EV brake wear brakes ELEC
How do I accomplish this?
I got as far as begin able to look at one column for one type of string, but got stumped on how to move past this point.
df['FUEL'] = df['SUMN'].str.contains('diesel', case=False)
Upvotes: 0
Views: 587
Reputation: 4618
There's definitely a more optimized solution, but hope this puts you on the right path...basically loops through each row, looping through the columns and potential fuel strings and decides which abbr to use:
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['all'] = df.apply(''.join, axis=1)
for i,row in df.iterrows():
df.at[i,'FUEL'] = d[[key for key in d.keys() if key in row['all'].lower()][0]]
del df['all']
output:
SUMN SOUN MATN FUEL
0 Light duty vehicle Diesel Tire wear Rubber DSL
1 Heavy duty diesel Non-catalyst Diesel DSL
2 Light duty truck catalyst Gasoline GAS
3 Medium duty vehicle EV brake wear brakes ELEC
this assume that only one of the fuel types occurs in each row
EDIT: inspired by the other solution:
import re
d={'diesel':'DSL','gasoline':'GAS','ev':'ELEC'}
df['FUEL'] = df.apply(lambda x: d[re.search('gasoline|diesel|ev',''.join(x).lower()).group()], axis=1)
same output :)
Upvotes: 1
Reputation: 42906
Here's a way using apply
with str.contains
to check all columns for each word. And finally we map the words to the correct ones, for example ev -> ELECT
.
Note that I use ?i
in my regex, which makes it not case sensitive:
words = ['gas', 'diesel', 'ev']
mapping = {'gas':'GAS', 'diesel':'DSL', 'ev':'ELEC'}
for word in words:
m = df.apply(lambda x: x.str.contains(f'(?i)({word})')).any(axis=1)
df.loc[m, 'FUEL'] = mapping[word]
Output
SUMN SOUN MATN FUEL
0 Light duty vehicle Diesel Tire wear Rubber DSL
1 Heavy duty diesel Non-catalyst Diesel DSL
2 Light duty truck catalyst Gasoline GAS
3 Medium duty vehicle EV brake wear brakes ELEC
Upvotes: 3