danishxr
danishxr

Reputation: 89

Removing special characters from a dataframe column of phone numbers

I have a dataframe df:

Name    phone_number    status
john    8967894567      FC
john    8967894567      FC
john    7846897345      CL 
john    78.478954+89    FC
john    78.478954+89    FC
Ram     4598761458      FC
Ram     4598761458      FC
Kevin   15.478945+67    CL

I want to change it in order to have the following result:

Name    phone_number    status
john    8967894567      FC
john    8967894567      FC
john    7846897345      CL 
john    7847895489      FC
john    7847895489      FC
Ram     4598761458      FC
Ram     4598761458      FC
Kevin   1547894567      CL

I tried to use re.sub like this:

import re
df['phone_number'] = re.sub('[.+]', '', df['phone_number'])

but that resulted in this:

Name    phone_number             status
john    0  0 8967894567\n1  1547894567  FC
john    0  0 8967894567\n1  1547894567  FC
john    0  0 7846897345\n1  1547894567  CL 
john    0  0 7847895489\n1   1547894567  FC
john    0  0 7847895489\n1   1547894567  FC
Ram     0  0 4598761458\n1  1547894567  FC
Ram     0  0 4598761458\n1  1547894567  FC
Kevin   0  0 1547894567\n1  1547894567  CL

What am I doing wrong?

Upvotes: 2

Views: 3926

Answers (1)

cs95
cs95

Reputation: 402263

Don't use re.sub, it isn't suited to working with dataframes. Use str.replace in its stead.

df.phone_number = df.phone_number.str.replace('[^\d]+', '')
df

    Name phone_number status
0   john   8967894567     FC
1   john   8967894567     FC
2   john   7846897345     CL
3   john   7847895489     FC
4   john   7847895489     FC
5    Ram   4598761458     FC
6    Ram   4598761458     FC
7  Kevin   1547894567     CL

The pattern [^\d]+ will match anything that is not a digit and that is what is removed.

Upvotes: 4

Related Questions