Reputation: 783
For a project to make communications clearer for a website, I have to pull the messages using regex (Why? Because the message is commented out. With normal document.getElement I can't reach the message. But with the Regex mentioned below i can.)
I am trying to get a value using this expression:
\s*<td width="61%"class="valorCampoSinTamFijoPeque">(.|\n)*?<\/td>
How i use this expression:
var pulledmessage = /\s*<td width="61%"class="valorCampoSinTamFijoPeque">(.|\n)*?<\/td>/.exec(htmlDoc);
The above expression gives me NULL when i console.log() it. My guess is that the htmlDoc format that i supply the regex is not working. I just have no clue how to make it so the value does get pulled.
What i use to parse HTML:
var html1 = httpGet(messages);
parser = new DOMParser();
htmlDoc = parser.parseFromString(html1,"text/html");
The result I want to get:
<td width="61%"class="valorCampoSinTamFijoPeque"><b>D.</b> De:
Information, Information.
Information, Information
Para: Information
CC: Information
Alot of text here ............
</td>
I edited the above value to remove personal information.
html1 contains a full HTML page with the information required.
Upvotes: 0
Views: 68
Reputation: 2351
New attempt. Seeing how the td
you need is commented out, remove all HTML comment delimiters from the loaded HTML file before parsing the document. This will result in the td
being rendered in the document and you can use innerHTML
to get the message content.
const
documentString = `
<!doctype html>
<html>
<body>
<div class="valorCampoSinTamFijoPeque">1</div>
<div class="valorCampoSinTamFijoPeque">2</div>
<div class="valorCampoSinTamFijoPeque">3</div>
<div class="valorCampoSinTamFijoPeque">4</div>
<div class="valorCampoSinTamFijoPeque">5</div>
<div class="valorCampoSinTamFijoPeque">6</div>
<!--<div class="valorCampoSinTamFijoPeque"><b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............</div>-->
<div class="valorCampoSinTamFijoPeque">8</div>
</body>
</html>`,
outputElement = document.getElementById('output');
debugger;
const
// Remove all comment delimiters from the input string.
cleanupDocString = documentString.replace(/(?:<!--|-->)/gm, '');
// Create a parser and construct a document based on the string. It should
// output 8 divs.
parser = new DOMParser();
htmlDoc = parser.parseFromString(cleanupDocString,"text/html");
const
// Get the 7th div with the class name from the parsed document.
element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[6];
// Log the element found in the parsed document.
console.log(element);
// Log the content from the element.
console.log(element.innerHTML);
<div id="output"></div>
Upvotes: 1
Reputation: 2351
There is no need for a regex, native JS has your back!
const
documentString = '<!doctype html><html><body><div class="valorCampoSinTamFijoPeque">1</div><div class="valorCampoSinTamFijoPeque">2</div><div class="valorCampoSinTamFijoPeque">3</div><div class="valorCampoSinTamFijoPeque">4</div><div class="valorCampoSinTamFijoPeque">5</div><div class="valorCampoSinTamFijoPeque">6</div><div class="valorCampoSinTamFijoPeque">7<!--<b>D.</b> De: Information, Information. Information, Information Para: Information CC: Information Alot of text here ............--></div><div class="valorCampoSinTamFijoPeque">8</div></body></html>',
outputElement = document.getElementById('output');
function getCommentText(element) {
for (var index=0; index<element.childNodes.length;index++){
const
node = element.childNodes[index];
if (node.nodeType === Node.COMMENT_NODE) {
return node.data;
}
}
}
// Create a parser and construct a document based on the string. It should
// output 8 divs.
parser = new DOMParser();
htmlDoc = parser.parseFromString(documentString,"text/html");
const
// Get the 7th div with the class name from the parsed document.
element = htmlDoc.getElementsByClassName('valorCampoSinTamFijoPeque')[6];
// Replace the HTML of the element with the content of the comment.
element.innerHTML = getCommentText(element);
// The the inner HTML of the parsed document's body and place it inside the output
// element in the page that is visible in the user agent. The 7th div should not
// contain a number but the text that was originally in the comment.
outputElement.innerHTML = htmlDoc.body.innerHTML;
<div id="output"></div>
Upvotes: 0