Tayne
Tayne

Reputation: 101

Python Playwright API get hrefs

I am trying to get all hrefs listed in a series of html element blocks. I don't know how to refer to the href as a selector, and I know the hrefs all begin with "/wiki/".

I was wondering if there was a way to query the page for all hrefs that begin with this specific start to the href.

Upvotes: 3

Views: 7218

Answers (2)

ggorlen
ggorlen

Reputation: 57344

Nowadays, locators are preferred since they'll auto-wait for the elements to be attached:

wiki_links = page.locator('a[href^="/wiki/"]').evaluate_all(
    "els => els.map(el => el.href)"
)

You can also use .getAttribute("href") rather than .href if you don't want the base URL included.

Upvotes: 0

Max Schmitt
Max Schmitt

Reputation: 3222

You can do:

hrefs_of_page = page.eval_on_selector_all("a[href^='/wiki/']", "elements => elements.map(element => element.href)")

which should work for your use-case. This will lookup for all the link tags which have a href attribute which starts with /wiki. Then on the browser side JavaScript gets evaluated which maps from an array of elements to the href attribute so a string array gets returned on the Python side.

Upvotes: 5

Related Questions