Reputation: 11
PROBLEM: hey y'all, i have a large dataset of both domestic and international phone numbers formatted in various ways that i need to convert to a particular format based on specific criteria.
example of current phone number formats in the dataset:
as you can see, the phone number formats vary greatly and there are many more examples that i did not list. i work with datasets averaging 1000+ rows.
what i try varies depending on how much data cleanup i need to perform, but below are some of my current methods.
Approach 1: Manually editing i have attempted manually updating the phone numbers to my desired formatting. however this is time consuming and leads to user error.
Approach 2: CTRL+1 "Format Cells" i start by sorting my list of numbers. then follow ctrl+1 > Number > Custom to format the following: domestic as 000-000-0000, UK as +##-##-####-####, etc.
the issue with this method is that the numbers are stored as formatted "Custom" values. so any special spaces or characters (i.e. "-", "+") do not exist within the string. meaning that i cannot import into my crm.
i have attempted to manually add "'" at the beginning of each formatted phone number, but it removes the special formatting. e.g. ###-###-#### just becomes '##########.
Approach 3: Functions i have tried using the following functions on domestic phone numbers, but they only work if formatting follows ###-###-####. which is not always the case for the data i work with.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"(",""),")","")," ",""),"-","")
or
=MID(A1,2,3)&MID(A1,7,3)&RIGHT(A1,4)
Approach 4: Macro i've attempted recording macros, but this does not work properly since the length/formatting of a cell value and size of a sheet always varies.
Approach 5: VBA script
i am currently exploring various scripts. there are a ton of examples on stackoverflow, but most presume clean data formatted as (###) ###-####. so the scripts do not work for me.
this post was helpful as a first step to removing all special characters from cells: Phone number format
but again, only applies to certain types of formatting.
DESIRED OUTCOME i undergo this process various times a month and am hoping somebody can help me optimize my approach.
i need domestic numbers to become ###-###-#### and international phone numbers vary, but the UK would look like +##-##-####-####. i need these characters to exist within the actual string of each cell, otherwise my crm will not accept the phone numbers.
Upvotes: 1
Views: 580
Reputation: 75930
I'm not entirely sure, but maybe this gets you going:
Formula in B1
:
=MAP(A1:A5,LAMBDA(x,LET(y,CONCAT(TEXTSPLIT(x,TEXTSPLIT(x,ROW(1:10)-1,,1),,1)),TEXT(--y,SWITCH(LEN(y),10,"###-###-####",12,"+##-##-####-####","0")))))
MAP(A1:A5,LAMBDA(x
- Loop over a given dataset;LET(y,CONCAT(TEXTSPLIT(x,TEXTSPLIT(x,ROW(1:10)-1,,1),,1))
- Part where each input gets cleared into just pure numeric characters;TEXT(--y,SWITCH(LEN(y),10,"###-###-####",12,"+##-##-####-####","0")))))
- Now use SWITCH()
to test against the length of the numeric input. If 10 or 12 we kind of know what format we like, the last parameter is the 'standard' format. But obviously you could start adding checks. In the samples given, you'd want to include options for length 11 and 15.Upvotes: 1