Arunvairavan V
Arunvairavan V

Reputation: 109

Regex Issue for Title Case on String Containing HTML Markup

Currently I'm running the following replacement approach ...

const str = '<span style="font-weight:bold;color:Blue;">ch</span>edilpakkam,tiruvallur';
const rex = (/(\b[a-z])/g);
 
const result = str.toLowerCase().replace(rex, function (letter) {
  //console.log(letter.toUpperCase())
  return letter.toUpperCase();
});

console.log(result);
.as-console-wrapper { min-height: 100%!important; top: 0; }

... with a source of ...

<span style="font-weight:bold;color:Blue;">ch</span>edilpakkam,tiruvallur

... and the following result ...

<Span Style="Font-Weight:Bold;Color:Blue;">Ch</Span>Edilpakkam,Tiruvallur

But what I want to achieve are the following points ...

  1. Bind span to string.
  2. Uppercase 1st letter and word after.
  3. Expected output
<span style="font-weight:bold;color:Blue;">Ch</span>edilpakkam,Tiruvallur

Upvotes: 2

Views: 109

Answers (2)

Jobelle
Jobelle

Reputation: 2834

Try the below

CheckThis

function formatText(str) {
  var res = str.replace(/(\b[a-z])/gi, function(match, $1){
   return $1.toUpperCase();
  }).replace(/^([a-z]{2})(.*)/gim, "<span style='font-weight:bold;color:Blue;'>$1</span>$2");
 return res;
}

Upvotes: 0

Peter Seliger
Peter Seliger

Reputation: 13432

Toto already commented on the difficulties of "parsing" HTML code via regex.

The following generic (markup agnostic) approach makes use of a sandbox like div element in order to benefit from its DOM parsing/accessing capabilities.

First, one needs to collect all text-nodes of the temporary sandbox. Then, for each text-node's textContent, one has to decide whether to start with capitalizing all words from a string's beginning or not.

The cases for capitalizing every word within a string including the first occurring one are ...

  • The text-node's previous sibling either does not exist ...
  • ... or is a block-level element.
  • The text-node itself starts with a whitespace(-sequence).

For all other cases one wants to capture/capitalize every first word character after a word boundary too ... except for the word at the beginning of a line.

function collectContentTextNodesRecursively(list, node) {
  return list.concat(
    (node.nodeType === 1) // element-node?

    ? Array
      .from(node.childNodes)
      .reduce(collectContentTextNodesRecursively, [])

    : (node.nodeType === 3) // text-node?
      ? node
      : []
  );
}

function getNodeSpecificWordCapitalizingRegex(textNode) {
  const prevNode = textNode.previousSibling;
  const isAssumeBlockBefore = (prevNode === null) || (/^(?:address|article|aside|blockquote|details|dialog|dd|div|dl|dt|fieldset|figcaption|figure|footer|form|h1|h2|h3|h4|h5|h6|header|hgroup|hr|li|main|nav|ol|p|pre|section|table|ul)$/g).test(prevNode.nodeName.toLowerCase());

  //     either assume a previous block element, or the current text starts with whitespace.
  return (isAssumeBlockBefore || (/^\s+/).test(textNode.textContent))

    // capture every first word character after word boundary.
    ? (/\b(\w)/g)
    // capture every first word character after word boundary except at beginning of line.
    : (/(?<!^)\b(\w)/g);
}


function capitalizeEachTextContentWordWithinCode(code) {
  const sandbox = document.createElement('div');
  sandbox.innerHTML = code;

  collectContentTextNodesRecursively([], sandbox).forEach(textNode => {

    textNode.textContent = textNode.textContent.replace(
      getNodeSpecificWordCapitalizingRegex(textNode),
      (match, capture) => capture.toUpperCase()
    ); 
  });
  return sandbox.innerHTML; 
}


const htmlCode = [
  '<span style="font-weight:bold;color:blue;">ch</span>edilpakkam,tiruvallur, chedilpakkam,tiruvallur',
  '<span style="font-weight:bold;color:blue;">ch</span> edilpakkam,tiruvallur, chedilpakkam,tiruvallur',
  '<span style="font-weight:bold;color:blue;">ch</span> edilpakkam, tiruvallur,chedilpakkam, tiruvallur',
  '<span style="font-weight:bold;color:blue;">ch</span>edilpakkam, tiruvallur,chedilpakkam, tiruvallur',
].join('<br\/>');

document.body.innerHTML = capitalizeEachTextContentWordWithinCode(htmlCode);

console.log(document.body.innerHTML.split('<br>'));
.as-console-wrapper { max-height: 57%!important; }

Upvotes: 1

Related Questions