robots.txt
robots.txt

Reputation: 137

Trouble fetching results from next pages using post requests

I've written a script in python to get the tabular data populated upon filling in two input boxes (From and Through) located at the top right corner of a webpage. The date I filled in to generate results are 08/28/2017 and 11/25/2018.

When I run my following script, I can get the tabular results from it's first page.

However, the data have spread across multiple pages through pagination and the url remains unchanged. How can I get the next page content?

Url to the site

This is my attempt:

import requests
from bs4 import BeautifulSoup

url = "https://www.myfloridalicense.com/FLABTBeerPricePosting/"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
try:
  evtrgt = soup.select_one("#__EVENTTARGET").get('value')
except AttributeError: evtrgt  = ""
viewstate = soup.select_one("#__VIEWSTATE").get('value')
viewgen = soup.select_one("#__VIEWSTATEGENERATOR").get('value')
eventval = soup.select_one("#__EVENTVALIDATION").get('value')

payload = {
  '__EVENTTARGET': evtrgt,
  '__EVENTARGUMENT': '',
  '__VIEWSTATE':viewstate, 
  '__VIEWSTATEGENERATOR':viewgen,
  '__VIEWSTATEENCRYPTED':'',
  '__EVENTVALIDATION':eventval,
  'ctl00$MainContent$txtPermitNo':'', 
  'ctl00$MainContent$txtPermitName': '',
  'ctl00$MainContent$txtBrandName':'', 
  'ctl00$MainContent$txtPeriodBeginDt':'08/28/2017',
  'ctl00$MainContent$txtPeriodEndingDt':'11/25/2018',
  'ctl00$MainContent$btnSearch': 'Search'
}

with requests.Session() as s:
  s.headers["User-Agent"] = "Mozilla/5.0"
  req = s.post(url,data=payload,cookies=res.cookies.get_dict())
  sauce = BeautifulSoup(req.text,"lxml")
  for items in sauce.select("#MainContent_gvBRCSummary tr"):
    data = [item.get_text(strip=True) for item in items.select("th,td")]
    print(data)

Any help to solve the issue will be highly appreciated. Once again: the data I wish to grab are the tabular content from the site's next pages as my script can already parse the data from it's first page?

P.S.: Browser simulator is not an option I would like to cope with.

Upvotes: 0

Views: 75

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

You need to add a loop for each page and assign the requested page number to the __EVENTARGUMENT parameter as follows:

import requests
from bs4 import BeautifulSoup

url = "https://www.myfloridalicense.com/FLABTBeerPricePosting/"

res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")

try:
    evtrgt = soup.select_one("#__EVENTTARGET").get('value')
except AttributeError: 
    evtrgt = ""

viewstate = soup.select_one("#__VIEWSTATE").get('value')
viewgen = soup.select_one("#__VIEWSTATEGENERATOR").get('value')
eventval = soup.select_one("#__EVENTVALIDATION").get('value')

payload = {
    '__EVENTTARGET' : evtrgt,
    '__EVENTARGUMENT' : '',
    '__VIEWSTATE' : viewstate, 
    '__VIEWSTATEGENERATOR' : viewgen,
    '__VIEWSTATEENCRYPTED' : '',
    '__EVENTVALIDATION' : eventval,
    'ctl00$MainContent$txtPermitNo' : '', 
    'ctl00$MainContent$txtPermitName' : '',
    'ctl00$MainContent$txtBrandName' : '', 
    'ctl00$MainContent$txtPeriodBeginDt' : '08/28/2017',
    'ctl00$MainContent$txtPeriodEndingDt' : '11/25/2018',
    'ctl00$MainContent$btnSearch': 'Search'
}

for page in range(1, 12):
    with requests.Session() as s:
        s.headers["User-Agent"] = "Mozilla/5.0"
        payload['__EVENTARGUMENT'] = f'Page${page}'
        req = s.post(url,data=payload,cookies=res.cookies.get_dict())
        sauce = BeautifulSoup(req.text, "lxml")

        for items in sauce.select("#MainContent_gvBRCSummary tr"):
            data = [item.get_text(strip=True) for item in items.select("th,td")]
            print(data)

Upvotes: 1

Related Questions