Python Compute new variable based on substring

Question

I have a dataframe with one column containing text, and I would like to create a new variable if a substring exists on a column

original df

team
1New York MetsMets1
2Seattle MarinersMariners2
3Los Angeles AngelsAngels3

new df

team                       code  
1New York MetsMets1        NYM 
2Seattle MarinersMariners2 SEA
3Los Angeles AngelsAngels3 ANA

So basically if the column contains the text 'Mets' then code column will be NYM, if it contains 'Mariners' then code equals SEA, etc. The reason for this is because the Team column can contain extraneous characters or extra information that I want to ignore, so I need some type of rule such that:

if exists in team(Mariners) then code = 'SEA'
if exists in team(New York Mets) then code = 'Mets'

Thanks!

DataSciRookie · Accepted Answer

You should try something like this :

df.loc[df['team'].str.contains('Mariners'),'code']='SEA'

but with a loop depending of the number of different words. You should do some preprocessing, maybe put everything in lower case to be sure you are not missing mariners instead of Mariners

Python Compute new variable based on substring

Answers (2)

Related Questions