pretzelhammer
pretzelhammer

Reputation: 15135

Create string templates from arbitrary regular expressions?

Regular expressions are used to parse already formatted strings but I would like to use them to take raw strings of characters and format them, examples:

// phone number
format("\(\d{3}\) \d{3}-\d{4}", "1234567890");
// should return "(123) 456-7890"
// date
format("\d{4}-\d{2}-\d{2}", "20180712");
// should return "2018-07-12"
// arbitrary
format("([A-Z]+-\d+ )+", "ABC123DEFGH45IJ6789");
// should return "ABC-123 DEFGH-45 IJ-6789 "

The above are just examples, I'd like a general solution that works for any arbitrary regex and any arbitrary string (that fits the regex).

Here's what I have so far, which is a little inelegant, and really limited in its abilities, but does satisfy the first 2 of the 3 examples above:

function consumeCharacters(amount) {
  return (characterArray) => {
    return characterArray.splice(0, amount).join('');
  };
}

function parseSimpleRegex(regexString) {
  // filter out backslash escapes
  let parsed = regexString.replace(/\\./g, (...args) => {
    return args[0][args[0].length-1];
  });
  
  // get literal characters
  let literals = parsed.split(/d\{\d\}/);
  
  // get variable symbols
  let variables = parsed.match(/d\{\d\}/g);
  let varFunctions = variables.map(variable => consumeCharacters(variable[2]));
  
  let result = [];
  while (literals.length > 0) {
    result.push(literals.shift());
    result.push(varFunctions.shift());
  }
  while (varFunctions.length > 0) {
    result.push(varFunctions.shift());     
  }
  
  // filter out undefineds & empty strings
  result = result.filter(resultPart => !!resultPart);
  return result;
}

function format(regexString, rawString) {
  let rawCharacters = rawString.split('');
  let formatter = null;
  try {
    formatter = parseSimpleRegex(regexString); 
  } catch (e) {
    return 'failed parsing regex';
  }
  let formattedString = formatter.map((format) => {
    if (typeof format === 'string') {
        return format;
    }
    if (typeof format === 'function') {
        return format(rawCharacters);
    }
  }).join('');
  return formattedString;
}

const testCases = [
  {
    args: ["\\(\\d{3}\\) \\d{3}-\\d{4}", "1234567890"],
    expected: "(123) 456-7890"
  },
  {
    args: ["\\d{4}-\\d{2}-\\d{2}", "20180712"],
    expected: "2018-07-12"
  },
  {
    args: ["([A-Z]+-\\d+ )+", "ABC123DEFGH45IJ6789"],
    expected: "ABC-123 DEFGH-45 IJ-6789 "
  },
];

testCases.forEach((testCase, index) => {
  const result = format(...testCase.args);
  const expected = testCase.expected;
  if (result === expected) {
    console.log(`Test Case #${index+1} passed`);
  } else {
    console.log(`Test Case #${index+1} failed, expected: "${expected}", result: "${result}"`);
  }
});

Can the above solution be scaled for more complex regexes? Or is there a better alternative approach?

Upvotes: 1

Views: 89

Answers (2)

Michał Turczyn
Michał Turczyn

Reputation: 37440

You could use pattern (\d{3})(\d{3})(\d{4}) and substitute it with (\d{3})(\d{3})(\d{4}), which yields 123-456-7890.

For third example, use: (\w{3})(\w{3})(\w{5})(\w{2})(\w{2})(\w{4}) and replace it with \1-\2 \3-\4 \5-\6, which returns ABC-123 DEFGH-45 IJ-6789.

Generally use (\w{n})...(\w{m}), where n and m are some integers for capturing p[arts of a string to parrticular groups (you could specify those intregers with an array). And you could also provide separators in an array as well to form your patterns.

Demo

UPDATE

As I said, general solution would be to supply sizes of blocks, that string should be split into and array of separators. See code below:

var str =  "ABC123DEFGH45IJ6789";
var blockSizes = [3,3,5,2,2,4];
var separators = ["-"," ","-"," ","-"];
var pattern = "(\\w{" + blockSizes[0] + "})";
var replacementPattern = "$1";
var i;
for(i = 1; i < blockSizes.length; i++)
{
    pattern += "(\\w{" + blockSizes[i] + "})";
    replacementPattern += separators[i - 1] + "$" + (i + 1);
}

Now, just use this patterns to replace and you're done:

JS fiddle

Regex demo

Upvotes: 1

Poul Bak
Poul Bak

Reputation: 10930

The general answer is: Use a regex that creates groups, then use replace with backreferences to format the output.

For example, using your first example, use this regex:

/(\d{3})(\d{3})(\d{4})/

It creates three Groups, the first 3 numbers, the next 3 numbers and the final 4 numbers.

Now the format, use string.replace function:with the following replacement pattern:

($1) $2-$3

I will add parentheses around the first Group, add a Space, then the second Group and finally a hyphen and the last Group.

How to use:

You can create your formatPhone function like this:

function formatPhone(rawPhone)
{
    return rawPhone.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}

You can do similar with your other patterns.

Edit:

A totally general soultion requires that you pass, both the raw string, the regex pattern and the replacement pattern to your function, like this:

function format(rawString, regex, replacement)
{
   return rawString.replace(regex, replacement);
}

where regex and replacement must follow the rules described above.

Edit2:

I think you have missunderstood something here. Let's take your first example:

format("\(\d{3}\) \d{3}-\d{4}", "1234567890");

Here the regex simply doesn't match!!! So in short, you can't make a function that takes a format regex. Regexes are made to match (and possibly replace) as shown above.

Upvotes: 2

Related Questions