thefragileomen
thefragileomen

Reputation: 1547

Last hyperlink in webpage table using Python

I am using Beautifulsoup4 to parse a webpage. Similar to how Bing works, if you enter a search term, it will return the first ten hits with the subsequent hits on following pages listed page 2, page 3 etc... The first page returned after the query does contain hyperlinks from page 2 until the very last page. What I am trying to establish is exactly what that very last page is (ie . Page 87) for example.

Below is a sample of the HTML source code from the page:

<tr><td colspan=4 align=left class='uilt'>����� ������� ��������: 3543.<br>��������: 1 <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=2">2</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=3">3</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=4">4</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=5">5</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=6">6</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=7">7</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=8">8</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=9">9</a> <a href="/main/search.php?str=&tag=&nopass=&cat=25&page=10">10</a> <br></td></tr>

In the above example, how would I work out that the last link is page 10? There is further HTML after the above and so I can't simply slice X amount of positions from the end of the HTML code.

Thanks

Upvotes: 1

Views: 64

Answers (3)

alecxe
alecxe

Reputation: 473893

If you are asking about how to find the last link in the provided HTML with BeautifulSoup - you can use a CSS Selector:

soup.select('td.uilt > a')[-1]

Or, using find() and find_all():

soup.find('td', class_='uilt').find_all('a')[-1]

Though, I'd agree with other participants in the topic that there is no need for BeautifulSoup. Selenium itself is a powerful tool and have a lot of techniques to locate elements on a page.

Upvotes: 2

techron
techron

Reputation: 17

First manually search the html for the count of links. You may be able to grab that number to link directly to the last page. If you can't find the last page number that way, then you can crawl from the last page on every search result page. Just iterate through all the link pages {1...10, 11...20,...} until you reach the last page and then do your operation to find the last link on that page.

Upvotes: 0

Andrew Magee
Andrew Magee

Reputation: 6684

With raw Selenium you should be able to do something like this:

driver.find_elements_by_css_selector(".uilt a")[-1].text

This will find the last <a> tag that is a descendant of the element with class uilt and return its text. No need for BeautifulSoup.

Upvotes: 2

Related Questions