Reputation: 15135
Regular expressions are used to parse already formatted strings but I would like to use them to take raw strings of characters and format them, examples:
// phone number
format("\(\d{3}\) \d{3}-\d{4}", "1234567890");
// should return "(123) 456-7890"
// date
format("\d{4}-\d{2}-\d{2}", "20180712");
// should return "2018-07-12"
// arbitrary
format("([A-Z]+-\d+ )+", "ABC123DEFGH45IJ6789");
// should return "ABC-123 DEFGH-45 IJ-6789 "
The above are just examples, I'd like a general solution that works for any arbitrary regex and any arbitrary string (that fits the regex).
Here's what I have so far, which is a little inelegant, and really limited in its abilities, but does satisfy the first 2 of the 3 examples above:
function consumeCharacters(amount) {
return (characterArray) => {
return characterArray.splice(0, amount).join('');
};
}
function parseSimpleRegex(regexString) {
// filter out backslash escapes
let parsed = regexString.replace(/\\./g, (...args) => {
return args[0][args[0].length-1];
});
// get literal characters
let literals = parsed.split(/d\{\d\}/);
// get variable symbols
let variables = parsed.match(/d\{\d\}/g);
let varFunctions = variables.map(variable => consumeCharacters(variable[2]));
let result = [];
while (literals.length > 0) {
result.push(literals.shift());
result.push(varFunctions.shift());
}
while (varFunctions.length > 0) {
result.push(varFunctions.shift());
}
// filter out undefineds & empty strings
result = result.filter(resultPart => !!resultPart);
return result;
}
function format(regexString, rawString) {
let rawCharacters = rawString.split('');
let formatter = null;
try {
formatter = parseSimpleRegex(regexString);
} catch (e) {
return 'failed parsing regex';
}
let formattedString = formatter.map((format) => {
if (typeof format === 'string') {
return format;
}
if (typeof format === 'function') {
return format(rawCharacters);
}
}).join('');
return formattedString;
}
const testCases = [
{
args: ["\\(\\d{3}\\) \\d{3}-\\d{4}", "1234567890"],
expected: "(123) 456-7890"
},
{
args: ["\\d{4}-\\d{2}-\\d{2}", "20180712"],
expected: "2018-07-12"
},
{
args: ["([A-Z]+-\\d+ )+", "ABC123DEFGH45IJ6789"],
expected: "ABC-123 DEFGH-45 IJ-6789 "
},
];
testCases.forEach((testCase, index) => {
const result = format(...testCase.args);
const expected = testCase.expected;
if (result === expected) {
console.log(`Test Case #${index+1} passed`);
} else {
console.log(`Test Case #${index+1} failed, expected: "${expected}", result: "${result}"`);
}
});
Can the above solution be scaled for more complex regexes? Or is there a better alternative approach?
Upvotes: 1
Views: 89
Reputation: 37440
You could use pattern (\d{3})(\d{3})(\d{4})
and substitute it with (\d{3})(\d{3})(\d{4})
, which yields 123-456-7890
.
For third example, use: (\w{3})(\w{3})(\w{5})(\w{2})(\w{2})(\w{4})
and replace it with \1-\2 \3-\4 \5-\6
, which returns ABC-123 DEFGH-45 IJ-6789
.
Generally use (\w{n})...(\w{m})
, where n
and m
are some integers for capturing p[arts of a string to parrticular groups (you could specify those intregers with an array). And you could also provide separators in an array as well to form your patterns.
UPDATE
As I said, general solution would be to supply sizes of blocks, that string should be split into and array of separators. See code below:
var str = "ABC123DEFGH45IJ6789";
var blockSizes = [3,3,5,2,2,4];
var separators = ["-"," ","-"," ","-"];
var pattern = "(\\w{" + blockSizes[0] + "})";
var replacementPattern = "$1";
var i;
for(i = 1; i < blockSizes.length; i++)
{
pattern += "(\\w{" + blockSizes[i] + "})";
replacementPattern += separators[i - 1] + "$" + (i + 1);
}
Now, just use this patterns to replace and you're done:
Upvotes: 1
Reputation: 10930
The general answer is: Use a regex that creates groups
, then use replace
with backreferences to format the output.
For example, using your first example, use this regex:
/(\d{3})(\d{3})(\d{4})/
It creates three Groups, the first 3 numbers, the next 3 numbers and the final 4 numbers.
Now the format, use string.replace
function:with the following replacement pattern:
($1) $2-$3
I will add parentheses around the first Group, add a Space, then the second Group and finally a hyphen and the last Group.
How to use:
You can create your formatPhone function like this:
function formatPhone(rawPhone)
{
return rawPhone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}
You can do similar with your other patterns.
Edit:
A totally general soultion requires that you pass, both the raw string, the regex pattern and the replacement pattern to your function, like this:
function format(rawString, regex, replacement)
{
return rawString.replace(regex, replacement);
}
where regex and replacement must follow the rules described above.
Edit2:
I think you have missunderstood something here. Let's take your first example:
format("\(\d{3}\) \d{3}-\d{4}", "1234567890");
Here the regex simply doesn't match!!! So in short, you can't make a function that takes a format regex. Regexes are made to match
(and possibly replace
) as shown above.
Upvotes: 2