Reputation: 89
I have a dataframe df
:
Name phone_number status
john 8967894567 FC
john 8967894567 FC
john 7846897345 CL
john 78.478954+89 FC
john 78.478954+89 FC
Ram 4598761458 FC
Ram 4598761458 FC
Kevin 15.478945+67 CL
I want to change it in order to have the following result:
Name phone_number status
john 8967894567 FC
john 8967894567 FC
john 7846897345 CL
john 7847895489 FC
john 7847895489 FC
Ram 4598761458 FC
Ram 4598761458 FC
Kevin 1547894567 CL
I tried to use re.sub
like this:
import re
df['phone_number'] = re.sub('[.+]', '', df['phone_number'])
but that resulted in this:
Name phone_number status
john 0 0 8967894567\n1 1547894567 FC
john 0 0 8967894567\n1 1547894567 FC
john 0 0 7846897345\n1 1547894567 CL
john 0 0 7847895489\n1 1547894567 FC
john 0 0 7847895489\n1 1547894567 FC
Ram 0 0 4598761458\n1 1547894567 FC
Ram 0 0 4598761458\n1 1547894567 FC
Kevin 0 0 1547894567\n1 1547894567 CL
What am I doing wrong?
Upvotes: 2
Views: 3926
Reputation: 402263
Don't use re.sub
, it isn't suited to working with dataframes. Use str.replace
in its stead.
df.phone_number = df.phone_number.str.replace('[^\d]+', '')
df
Name phone_number status
0 john 8967894567 FC
1 john 8967894567 FC
2 john 7846897345 CL
3 john 7847895489 FC
4 john 7847895489 FC
5 Ram 4598761458 FC
6 Ram 4598761458 FC
7 Kevin 1547894567 CL
The pattern [^\d]+
will match anything that is not a digit and that is what is removed.
Upvotes: 4