Regex to remove specific parts of a string in a column dataframe python

Question

I'm working with a dataframe which contains addresses and I want to delete a specfic part of a string. Like for example

And I want to delete the string since taking the words "REFERENCE:" and "reference:" to the end of the sentence. Also I want to create a new column that looks something like this (without the word REFERENCE:/reference: and the next letter of those words) Could you help me to do it in Regex? I want that it the new column looks something like this:

gold_cy · Accepted Answer

You can use some regex to obtain the desired results.

df = pd.DataFrame({"address": ["Street Pases de la Reforma #200 REFERENCE: Green house", "Street Carranza #300 12 & 13 REFERENCE: There is a tree"]})

df.address.str.findall(r".+?(?=REFERENCE)").explode()

0    Street Pases de la Reforma #200 
1       Street Carranza #300 12 & 13

Explanation of the regex pattern:

.+? matches any character (except for line terminators)
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=REFERENCE)

Regex to remove specific parts of a string in a column dataframe python

Answers (2)

Related Questions