Reputation: 1946
Thanks for taking the time to read this.
I'm using Python pandas to merge two datasets on a column named 'title'. Some of the data, in one dataset has additional characters in the title cells surrounded by parentheses which causes the merge to fail on these cells. I'm trying to remove the parentheses and the values they contain using the following however, the merge still misses the updated data.
Data sample, code and regex are below.
I'm assuming that the regex is incorrect - any thoughts?
import pandas as pd
data1 = pd.DataFrame({'id': ['a12bcde0'], 'title': ['company_a']})
data2 = pd.DataFrame({'serial_number': ['01a2b345','10ab2030'],'title':['company_a','company_a (123)']})
data2['title'].replace(regex=True,inplace=True,to_replace=r"\(.*\)",value=r'')
pd.merge(data1, data2, on='title')
Upvotes: 1
Views: 71
Reputation: 45552
You're forgetting the whitespace before the opening parentheses in your pattern: to_replace=r"\s\(.*\)"
Upvotes: 2