Extracting particular text

Question

I am trying to extract all links to videos on a particular WordPress website. Each page has only one video.

Inside each page crawled, there is the following code:

I would like to extract the text from here

Google Chrome Inspector tells me that this can be addressed as:

But each webpage I am crawling has a different "post" number. They are quite random, hence I cannot easily use the aforementioned selectors.

alecxe · Accepted Answer

If there is a dynamic part inside the id attribute, you can address it by partial-matching:

[id^=post] > div > p > iframe

where ^= means "starts with".

XPath alternative:

//*[starts-with(@id, "post")]/div/p/iframe

See also if you can avoid checking for div and p intermediate elements altogether and do:

[id^=post] iframe
//*[starts-with(@id, "post")]//iframe

You may additionally check for the iframe name as well:

[id^=post] iframe[name=vooplayerframe]
//*[starts-with(@id, "post")]//iframe[@name = "vooplayerframe"]

Answers (1)