Reputation: 80
In data frame, how to remove unnecessary thing from Contact number
df
Id Phone
1 (+1)123-456-7890
2 (123)-(456)-(7890)
3 123-456-7890
Final Output
Id Phone
1 1234567890
2 1234567890
3 1234567890
Upvotes: 1
Views: 491
Reputation: 262284
I would use a regex with str.replace
here:
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
output:
Id Phone Phone2
0 1 (+1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
regex:
^(?:\(\+\d+\)) # match a (+0) leading identifier
| # OR
\D # match a non-digit
This might be important to keep.
Keep the prefixes:
df['Phone2'] = df['Phone'].str.replace(r'[^+\d]', '', regex=True)
output:
Id Phone Phone2
0 1 (+1)123-456-7890 +11234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 (+380)123-456-7890 +3801234567890
Only drop a specific prefix (here +1
):
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\+1\))|[^+\d]', '', regex=True)
# or, more flexible
df['Phone2'] = df['Phone'].str.replace(r'(?:\+1\D)|[^+\d]', '', regex=True)
output:
Id Phone Phone2
0 1 (+1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 (+380)123-456-7890 +3801234567890
Upvotes: 4