Diogo Fernandes
Diogo Fernandes

Reputation: 5

Split dataframe column after apply method

I have this code to clean LinkedIn job titles:

def clean_title(position):
    if 'back-end' in position.lower():
        return 'Backend Developer'
    elif 'front-end' in position.lower():
        return 'Frontend Developer'
    elif 'full-stack' in position.lower():
        return 'Fullstack Developer'
    elif  '-' in position.lower():
        return position.split('-')[0].strip()
    elif  '|' in position:
        return position.split('|')[0].strip()
    elif  '(' in position:
        return position.split('(')[0].strip()
    elif ':' in  position:
        return position.split(':')[1].split('-')[0].strip()
    elif '-' in  position:
        if '-' in position.split('-')[0].strip():
            return position.split('-')[0].split('-')[0].strip()
        else:
            return position.split('-')[0].strip()
    else:
        return position

df['Position'].apply(clean_title).value_counts()

After execute previous code, I still have some job titles incorrect. After applying clean_title function, I want to split the titles that still have the hifen (-) char.

https://i.sstatic.net/hYKbJ.png

How can I proceed?

Upvotes: 0

Views: 33

Answers (1)

JonSG
JonSG

Reputation: 13152

I think what you want is to recursively call clean_title(). Maybe an implementation like:

def clean_title(position):
    if 'back-end' in position.lower():
        return 'Backend Developer'

    if 'front-end' in position.lower():
        return 'Frontend Developer'

    if 'full-stack' in position.lower():
        return 'Fullstack Developer'

    if  '-' in position:
        return clean_title(position.split('-')[0].strip())

    if  '|' in position:
        return clean_title(position.split('|')[0].strip())

    if  '(' in position:
        return clean_title(position.split('(')[0].strip())

    if ':' in  position:
        return clean_title(position.split(':')[0].strip())

    return position

Upvotes: 1

Related Questions