Granny
Granny

Reputation: 783

Get value from parsed HTML using Regex

For a project to make communications clearer for a website, I have to pull the messages using regex (Why? Because the message is commented out. With normal document.getElement I can't reach the message. But with the Regex mentioned below i can.)

I am trying to get a value using this expression:

\s*<td width="61%"class="valorCampoSinTamFijoPeque">(.|\n)*?<\/td>

How i use this expression:

var pulledmessage = /\s*<td width="61%"class="valorCampoSinTamFijoPeque">(.|\n)*?<\/td>/.exec(htmlDoc);

The above expression gives me NULL when i console.log() it. My guess is that the htmlDoc format that i supply the regex is not working. I just have no clue how to make it so the value does get pulled.

What i use to parse HTML:

var html1 = httpGet(messages);

parser = new DOMParser();

htmlDoc = parser.parseFromString(html1,"text/html");

The result I want to get:

<td width="61%"class="valorCampoSinTamFijoPeque"><b>D.</b> De: 
Information, Information. 
Information, Information
Para: Information
CC: Information
Alot of text here ............
</td>

I edited the above value to remove personal information.

html1 contains a full HTML page with the information required.

enter image description here

Upvotes: 0

Views: 68

Answers (2)

Thijs
Thijs

Reputation: 2351

New attempt. Seeing how the td you need is commented out, remove all HTML comment delimiters from the loaded HTML file before parsing the document. This will result in the td being rendered in the document and you can use innerHTML to get the message content.

const 
  documentString = `
  <!doctype html>
    <html>
    <body>
      <div class="valorCampoSinTamFijoPeque">1</div>
      <div class="valorCampoSinTamFijoPeque">2</div>
      <div class="valorCampoSinTamFijoPeque">3</div>
      <div class="valorCampoSinTamFijoPeque">4</div>
      <div class="valorCampoSinTamFijoPeque">5</div>
      <div class="valorCampoSinTamFijoPeque">6</div>
      <!--<div class="valorCampoSinTamFijoPeque"><b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............</div>-->
      <div class="valorCampoSinTamFijoPeque">8</div>
      </body>
    </html>`,
  outputElement = document.getElementById('output');

  debugger;
const
  // Remove all comment delimiters from the input string.
  cleanupDocString = documentString.replace(/(?:<!--|-->)/gm, '');
// Create a parser and construct a document based on the string. It should 
// output 8 divs.
parser = new DOMParser();
htmlDoc = parser.parseFromString(cleanupDocString,"text/html");

const
  // Get the 7th div with the class name from the parsed document.
  element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[6];

// Log the element found in the parsed document.
console.log(element);
// Log the content from the element.
console.log(element.innerHTML);
<div id="output"></div>

Upvotes: 1

Thijs
Thijs

Reputation: 2351

There is no need for a regex, native JS has your back!

const 
  documentString = '<!doctype html><html><body><div class="valorCampoSinTamFijoPeque">1</div><div class="valorCampoSinTamFijoPeque">2</div><div class="valorCampoSinTamFijoPeque">3</div><div class="valorCampoSinTamFijoPeque">4</div><div class="valorCampoSinTamFijoPeque">5</div><div class="valorCampoSinTamFijoPeque">6</div><div class="valorCampoSinTamFijoPeque">7<!--<b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............--></div><div class="valorCampoSinTamFijoPeque">8</div></body></html>',
  outputElement = document.getElementById('output');
  

function getCommentText(element) {
  for (var index=0; index<element.childNodes.length;index++){
    const
      node = element.childNodes[index];
    if (node.nodeType === Node.COMMENT_NODE) {
      return node.data;
    }
  }
}

// Create a parser and construct a document based on the string. It should 
// output 8 divs.
parser = new DOMParser();
htmlDoc = parser.parseFromString(documentString,"text/html");

const
  // Get the 7th div with the class name from the parsed document.
  element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[6];

// Replace the HTML of the element with the content of the comment.
element.innerHTML = getCommentText(element);

// The the inner HTML of the parsed document's body and place it inside the output  
// element in the page that is visible in the user agent. The 7th div should not 
// contain a number but the text that was originally in the comment.
outputElement.innerHTML = htmlDoc.body.innerHTML;
<div id="output"></div>

Upvotes: 0

Related Questions