MLearner
MLearner

Reputation: 164

How can I fetch the html elements within a bounding box using playwright?

I have a full page screenshot taken with playwright and a object detection model which returns some bounding box coordinates over it. I would like to fetch the corresponding html element's text that is contained within the bounding box.

So far, I am using Playwright's Python API like this:

with sync_playwright() as p:
     browser = p.webkit.launch()
     page = browser.new_page()
     page.goto(url)     
     element_text = page.evaluate("([x, y]) => document.elementFromPoint(x,y).textContent",
                                   [bbox_x_center, bbox_y_center])

But it seems that the coordinates further down in the page are not found. Is it possible that the lower elements are not in the viewport and I need to scroll down to find them? And in general is there an easier way to retrieve the element handles of interest within a bounding box from a screenshot?

Upvotes: 1

Views: 1586

Answers (1)

MLearner
MLearner

Reputation: 164

As the documentation says about document.elementFromPoint: https://developer.mozilla.org/en-US/docs/Web/API/Document/elementFromPoint

If the specified point is outside the visible bounds of the document or either coordinate is negative, the result is null.

So you can use before window.scrollTo which should fix it: https://developer.mozilla.org/en-US/docs/Web/API/Window/scrollTo

Upvotes: 2

Related Questions