GiveItAwayNow
GiveItAwayNow

Reputation: 447

How can I get a link from <a href="#" onClick=?

I'm parsing http://www.treccani.it/lingua_italiana/sinonimi_regionali/ using python3 and beautifulsoup. I've parsed first page and I need to go to the second page, to third and etc. Moving to another page is made by button(image):

<div class="next">
    <a href="#" onClick="doSearch(1, 4, 37); return false;" title="Pagina successiva">
        <img src="/export/system/modules/it.banzai.treccani.portale3/resources/images/arrow-right.png" />
    </a>
</div>

Please tell me, how can I get the link to go to next page? Or how can I move between pages using python?

Upvotes: 0

Views: 329

Answers (2)

bobthemac
bobthemac

Reputation: 1172

The problem with using BeautifulSoup is that it returns a static page to you if the link is not in the html you cannot get it using BeautifulSoup as it is simply a parser and does not run the page.

As mentioned in the other answers a good approach to use this is selenium, You could also try and find the doSearch JavaScript work out what it is doing a replicate it on your python end this does seem a little messy though. After looking at the doSearch function selenium seems like your best shot.

Upvotes: 1

Wayne Werner
Wayne Werner

Reputation: 51877

I think you're going to need a Javascript engine, rather than Beautiful Soup.

One good approach is using browser automation via Selenium. Unless you feel like guessing - because you'll have to know what the doSearch function is actually doing, and if they change the Javascript then your code will no longer do what you expect.

Upvotes: 1

Related Questions