Reputation: 23859

python/pandas - substring replacement using regular expression

I have inherited an old code file that has the following code. It seems the last line of the code below is removing all the open ( and close ) parentheses, and - character from the phone number field.

Question: But why it is using regex='\(' in .replace(regex='\(',value='') part of that last line? Some other online examples (such as here and here) I have seen don't seem to use regex keyword in their replacement function. What regex='\(' is doing in the replace function here?

import sqlalchemy as sq
import pandas as pd
import re

pw = dbutils.secrets.get(scope='SomeScope',key='sql')
engine = sq.create_engine('mssql+pymssql://SERVICE.Databricks.NONPUBLICETL:'+pw+'MyAzureSQL.database.windows.net:1433/TEST', isolation_level="AUTOCOMMIT")

pandas_df = pd.read_sql('select * from SQLTable1', con=engine)

pandas_df['MOBILE_PHONE'].replace(regex='\(',value='').replace(regex='\)',value='').replace(regex='\-',value='').str.strip()

Upvotes: 1

Answers (3)

wwnde

Reputation: 26676

Coding precision depends on experience, logic and mastery of syntax. Its like mastery of normal language. The answer you adapted achieves exactly what the code below does

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].str.replace('[^\d]','',regex=True)

Explanation

\d is regex for digits

[^] is regex for everything except

[^\d] everything except digits

So, using the pandas API, I replace everything except digits in the string with nothing

Outcome

    MOBILE_PHONE  MOBILE_PHONE2
0  (425) 555-1234    4255551234

Upvotes: 1

Corralien

Reputation: 120469

The signature of the replace function has changed.

Replace your last line by:

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].replace('[()-]', '', regex=True).str.strip()
print(df)

# Output
     MOBILE_PHONE MOBILE_PHONE2
0  (425) 555-1234   425 5551234

Replace ( or ) or - by ''

Upvotes: 1

SuperPineapple

Reputation: 31

You can use the regex keyword as a boolean to tell .replace() whether to interpret the string as regex or as the regular expression itself

Upvotes: 0

python/pandas - substring replacement using regular expression

Answers (3)

Related Questions