Parsing different pages with urlparse

Question

I am trying to parse multiple pages of a website but I can't understand how to change the query of the url (if this makes sense?)

I tried to create a next_page that took the first page and added +1 everytime it found the next page element, but I think I can't because I'll have multiple start urls (all similar). When i try to get the information of the next page element it returns this:

["loadmoreresult('?networkId=24&pageNumber=2&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword=&lastCid=116490'); return false;"]

Using url.parse(response.url).query I get:

'networkId=24&pageNumber=1&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword='

All I need to do is create a new link that uses the same scheme, path and then changes the query.

If you need more info please tell me, I don't really know what is more relevant to you as I am still a beginner.

from urllib.parse import urlparse, urljoin

urlparse(response.url)
>>> ParseResult(scheme='https', netloc='www.wcaworld.com', path='/Directory', params='', query='networkId=24&pageNumber=1&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword=', fragment='')

response.css('a.loadmore::attr(onmouseover)').extract()
>>>["loadmoreresult('?networkId=24&pageNumber=2&pageSize=100&allnet=yes&networkIds=1&networkIds=2&networkIds=3&networkIds=4&networkIds=61&networkIds=98&networkIds=108&networkIds=6&networkIds=5&networkIds=22&networkIds=13&networkIds=18&networkIds=15&networkIds=16&networkIds=105&networkIds=38&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&licenseIds=0&searchby=CountryCode&orderby=CountryCity&country=ES&city=&keyword=&lastCid=116490'); return false;"]

Parsing different pages with urlparse

Answers (1)

Related Questions