Reputation: 1
I have a pandas dataframe and I am trying to format the phone numbers in such a way that I would like to add dashes between the numbers 306-877-9993 for each first three phone numbers. I would also like to remove the string "I don't have one" and the dummy phone number 999999999999. How can I do that? Thanks
Last Name First Name Phone number
0 Dupont Marie 3068779993
1 Trey Tom 16669858121
2 Johnson Lily (407)6579091
3 Parmentier John I don't have one
4 Predi Pamela 999999999999
Edit: This is an Excel file that contains several phone numbers entered manually. I am trying to see if there's a way to format the phone numbers and clean the file. I have tried to strip the parenthesis with : df['Phone_number'] = df.Phone_number.str.strip('(')
But I am getting a bunch of NaN for some phone numbers.
Upvotes: 0
Views: 1800
Reputation: 245
You can use the function clean_phone()
from the library DataPrep. Install it with pip install dataprep
.
>>> from dataprep.clean import clean_phone
>>> df = pd.DataFrame({"Phone number": [3068779993, "16669858121",
"(407)6579091", "I don't have one", 999999999999]})
>>> clean_phone(df, "Phone number")
Phone Number Cleaning Report:
3 values cleaned (60.0%)
2 values unable to be parsed (40.0%), set to NaN
Result contains 3 (60.0%) values in the correct format and 2 null values (40.0%)
Phone number Phone number_clean
0 3068779993 306-877-9993
1 16669858121 666-985-8121
2 (407)6579091 407-657-9091
3 I don't have one NaN
4 999999999999 NaN
Upvotes: 1