Fastest way for substring replacement in pandas

Question

I have a list of substrings that I want to replace with ' '. What is the fastest way to do so? Is this possible with cython? This is really slow when applying it to 1 million row so fastest execution is what I'm looking for.

Example:

df = pd.DataFrame({ "text":
                    ["first text to replace"
                     , "second text to replace"
                     , "test this string"
                     , "this is not the first string"
                     , "short string test"]
                    })

removal_list = ["text to replace", "this string"]

Some attempts:

def replace_str(df, col, removal_list):
    for item in removal_list:
        df[col] = df[col].str.replace(item, ' ')
    return df

replace_str(df,'text', removal_list)



 def replace_text(text):
    miscdict_comp = {re.compile(a): ' ' for a in removal_list}
    for pattern, replacement in miscdict_comp.items():
        text = pattern.sub(replacement, text)
    return text

df['text'] = apply(replace_text)

Fastest way for substring replacement in pandas

Answers (1)

Related Questions