Reputation: 5
I have this code to clean LinkedIn job titles:
def clean_title(position):
if 'back-end' in position.lower():
return 'Backend Developer'
elif 'front-end' in position.lower():
return 'Frontend Developer'
elif 'full-stack' in position.lower():
return 'Fullstack Developer'
elif '-' in position.lower():
return position.split('-')[0].strip()
elif '|' in position:
return position.split('|')[0].strip()
elif '(' in position:
return position.split('(')[0].strip()
elif ':' in position:
return position.split(':')[1].split('-')[0].strip()
elif '-' in position:
if '-' in position.split('-')[0].strip():
return position.split('-')[0].split('-')[0].strip()
else:
return position.split('-')[0].strip()
else:
return position
df['Position'].apply(clean_title).value_counts()
After execute previous code, I still have some job titles incorrect. After applying clean_title function, I want to split the titles that still have the hifen (-) char.
https://i.sstatic.net/hYKbJ.png
How can I proceed?
Upvotes: 0
Views: 33
Reputation: 13152
I think what you want is to recursively call clean_title()
. Maybe an implementation like:
def clean_title(position):
if 'back-end' in position.lower():
return 'Backend Developer'
if 'front-end' in position.lower():
return 'Frontend Developer'
if 'full-stack' in position.lower():
return 'Fullstack Developer'
if '-' in position:
return clean_title(position.split('-')[0].strip())
if '|' in position:
return clean_title(position.split('|')[0].strip())
if '(' in position:
return clean_title(position.split('(')[0].strip())
if ':' in position:
return clean_title(position.split(':')[0].strip())
return position
Upvotes: 1