Reputation: 23859
I have inherited an old code file that has the following code. It seems the last line of the code below is removing all the open (
and close )
parentheses, and -
character from the phone number field.
Question: But why it is using regex='\('
in .replace(regex='\(',value='')
part of that last line? Some other online examples (such as here and here) I have seen don't seem to use regex
keyword in their replacement function. What regex='\('
is doing in the replace function here?
import sqlalchemy as sq
import pandas as pd
import re
pw = dbutils.secrets.get(scope='SomeScope',key='sql')
engine = sq.create_engine('mssql+pymssql://SERVICE.Databricks.NONPUBLICETL:'+pw+'MyAzureSQL.database.windows.net:1433/TEST', isolation_level="AUTOCOMMIT")
pandas_df = pd.read_sql('select * from SQLTable1', con=engine)
pandas_df['MOBILE_PHONE'].replace(regex='\(',value='').replace(regex='\)',value='').replace(regex='\-',value='').str.strip()
Upvotes: 1
Views: 53
Reputation: 26676
Coding precision depends on experience, logic and mastery of syntax. Its like mastery of normal language. The answer you adapted achieves exactly what the code below does
df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].str.replace('[^\d]','',regex=True)
Explanation
\d
is regex for digits
[^]
is regex for everything except
[^\d]
everything except digits
So, using the pandas API, I replace everything except digits in the string with nothing
Outcome
MOBILE_PHONE MOBILE_PHONE2
0 (425) 555-1234 4255551234
Upvotes: 1
Reputation: 120469
The signature of the replace
function has changed.
Replace your last line by:
df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].replace('[()-]', '', regex=True).str.strip()
print(df)
# Output
MOBILE_PHONE MOBILE_PHONE2
0 (425) 555-1234 425 5551234
Replace (
or )
or -
by ''
Upvotes: 1
Reputation: 31
You can use the regex keyword as a boolean to tell .replace() whether to interpret the string as regex or as the regular expression itself
Upvotes: 0