Reputation: 40
I'm working with a dataframe which contains addresses and I want to delete a specfic part of a string. Like for example
And I want to delete the string since taking the words "REFERENCE:" and "reference:" to the end of the sentence. Also I want to create a new column that looks something like this (without the word REFERENCE:/reference: and the next letter of those words) Could you help me to do it in Regex?
I want that it the new column looks something like this:
Upvotes: 0
Views: 369
Reputation: 360
The regex should look like this:
import re
discard_re = re.compile('(reference:.*)', re.IGNORECASE | re.MULTILINE)
then you can add the new column:
df['address_new'] = df.addresses.map(lambda x: discard_re.sub('', x))
Upvotes: 1
Reputation: 14216
You can use some regex to obtain the desired results.
df = pd.DataFrame({"address": ["Street Pases de la Reforma #200 REFERENCE: Green house", "Street Carranza #300 12 & 13 REFERENCE: There is a tree"]})
df.address.str.findall(r".+?(?=REFERENCE)").explode()
0 Street Pases de la Reforma #200
1 Street Carranza #300 12 & 13
Explanation of the regex pattern:
.+? matches any character (except for line terminators)
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=REFERENCE)
Upvotes: 1