Splitting a long string into numeric and alpha components with Regex Split

Question

I am making an app that is reading an excel file containing customer information and pushing this data into an SQL database. The problem is that the original designer instead of using a columns to store customer name, customer phone, secondary contact name, and secondary contact phone they put it all of it into one long sentence in one single column.

My plan is to strip out all whitespace and non alpha-numeric characters from the entry so I essentially get one long string which at its longest could be something like this

JeffSmith07621589641SarahSmith09854315741

I intended to split the names and the numbers, (and further split the names again into first/last), using a regular expression. I'd been trying regex.split like so

String[] splitArray = Regex.Split("JeffSmith07621589641SarahSmith09854315741", 
                                  @"(?<=[a-zA-Z])(?=\d)");

I hoped to get 4 elements but instead my results are coming out like this

stringArray[0] = JeffSmith
stringArray[1] = 07621589641SarahSmith
stringArray[3] = 09854315741

As you can see I'm not getting a split from the first phone number, second name.

What would be the best way to extract the data?

If its a regex, what is needed to add to the regular expression in order to achieve what I'm looking for?

Regex is slow and I have around 4000 records to process in the Excel file.

Avinash Raj · Accepted Answer

Ya, just do the same for another possibility. ie, match also the boundary which exists between a digit and letter. Currently your regex only matches the boundary which exists between letter and a digit.

String[] splitArray = Regex.Split("JeffSmith07621589641SarahSmith09854315741", @"(?<=[a-zA-Z])(?=\d)|(?<=\d)(?=[a-zA-Z])");

DEMO

Splitting a long string into numeric and alpha components with Regex Split

Answers (2)

Related Questions