Reputation: 207
How do I capture TARGET from the following HTML sample with XPath and Puppeteer?
<div id="parent">
<div id="sibling_1"> Hello </div>
<div id="sibling_2"> Good </div>
TARGET
<div id="sibling_3"> Bye </div>
</div>
I can get Good Bye with the following code, but I don't think there is a way to get TARGET.
let xpath = '//*[@id="sibling_1"]/following-sibling::*';
let elements = await page.$x(xpath);
for(var j in elements){
let xpathTextContent = await elements[j].getProperty('textContent')
let text = await xpathTextContent.jsonValue();
console.log("Text: ",text);
}
Upvotes: 0
Views: 56
Reputation: 57344
If you don't need to use XPath in particular, a plain CSS selector with child node iteration works:
import puppeteer from "puppeteer"; // ^22.7.1
const html = `<div id="parent">
<div id="sibling_1"> Hello </div>
<div id="sibling_2"> Good </div>
TARGET
<div id="sibling_3"> Bye </div>
</div>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const text = await page.$eval("#parent", el =>
[...el.childNodes]
.find(
e =>
e.textContent.trim() && e.nodeType === Node.TEXT_NODE
)
.textContent.trim()
);
console.log(text); // => TARGET
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
If you want all of the text nodes in cases where there are multiple:
[...el.childNodes]
.filter(
e =>
e.textContent.trim() && e.nodeType === Node.TEXT_NODE
)
.map(e => e.textContent.trim())
.join("") // optional, you may prefer an array
If your logic is that you want to select the next sibling after #sibling_2
, then use:
const text = await page.$eval("#sibling_2", el =>
el.nextSibling.textContent.trim()
);
Upvotes: 0
Reputation: 207
It turns out TARGET belongs to the parent element:
let xpath = '//*[@id="parent"]';
let elements = await page.$x(xpath);
let xpathTextContent = await elements[0].getProperty('textContent')
let text = await xpathTextContent.jsonValue();
Upvotes: 0
Reputation: 14145
Here is the solution in javascript.
document.querySelector('div#parent').innerText
Upvotes: 1