Pandas string tokeniztion too slow

Question

I have a column in a Pandas DataFrame, where each row has some string containing a job description like 'senior data consultant', and there are approximately 1,000,000 of these rows. I want to shorten this string to just the first word (which in that example would give 'senior'). The code below does this without error.

def proc_Profession(df):
    for row in range(df['Profession'].size):
        try:
            df['Profession'].iloc[row] = df['Profession'].iloc[row].split(' ')[0]
        except AttributeError:
            df['Profession'].iloc[row] = 'unknown'
    return df

The problem that I have is that this is too slow (it takes several hours), is there a faster way of doing this?

Pandas string tokeniztion too slow

Answers (1)

Related Questions