Reputation: 137
I've written a script in python to get the tabular data populated upon filling in two input boxes (From
and Through
) located at the top right corner of a webpage. The date I filled in to generate results are 08/28/2017
and 11/25/2018
.
When I run my following script, I can get the tabular results from it's first page.
However, the data have spread across multiple pages through pagination and the url remains unchanged. How can I get the next page content?
This is my attempt:
import requests
from bs4 import BeautifulSoup
url = "https://www.myfloridalicense.com/FLABTBeerPricePosting/"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
try:
evtrgt = soup.select_one("#__EVENTTARGET").get('value')
except AttributeError: evtrgt = ""
viewstate = soup.select_one("#__VIEWSTATE").get('value')
viewgen = soup.select_one("#__VIEWSTATEGENERATOR").get('value')
eventval = soup.select_one("#__EVENTVALIDATION").get('value')
payload = {
'__EVENTTARGET': evtrgt,
'__EVENTARGUMENT': '',
'__VIEWSTATE':viewstate,
'__VIEWSTATEGENERATOR':viewgen,
'__VIEWSTATEENCRYPTED':'',
'__EVENTVALIDATION':eventval,
'ctl00$MainContent$txtPermitNo':'',
'ctl00$MainContent$txtPermitName': '',
'ctl00$MainContent$txtBrandName':'',
'ctl00$MainContent$txtPeriodBeginDt':'08/28/2017',
'ctl00$MainContent$txtPeriodEndingDt':'11/25/2018',
'ctl00$MainContent$btnSearch': 'Search'
}
with requests.Session() as s:
s.headers["User-Agent"] = "Mozilla/5.0"
req = s.post(url,data=payload,cookies=res.cookies.get_dict())
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select("#MainContent_gvBRCSummary tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)
Any help to solve the issue will be highly appreciated. Once again: the data I wish to grab are the tabular content from the site's next pages as my script can already parse the data from it's first page?
P.S.: Browser simulator is not an option I would like to cope with.
Upvotes: 0
Views: 75
Reputation: 46759
You need to add a loop for each page and assign the requested page number to the __EVENTARGUMENT
parameter as follows:
import requests
from bs4 import BeautifulSoup
url = "https://www.myfloridalicense.com/FLABTBeerPricePosting/"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
try:
evtrgt = soup.select_one("#__EVENTTARGET").get('value')
except AttributeError:
evtrgt = ""
viewstate = soup.select_one("#__VIEWSTATE").get('value')
viewgen = soup.select_one("#__VIEWSTATEGENERATOR").get('value')
eventval = soup.select_one("#__EVENTVALIDATION").get('value')
payload = {
'__EVENTTARGET' : evtrgt,
'__EVENTARGUMENT' : '',
'__VIEWSTATE' : viewstate,
'__VIEWSTATEGENERATOR' : viewgen,
'__VIEWSTATEENCRYPTED' : '',
'__EVENTVALIDATION' : eventval,
'ctl00$MainContent$txtPermitNo' : '',
'ctl00$MainContent$txtPermitName' : '',
'ctl00$MainContent$txtBrandName' : '',
'ctl00$MainContent$txtPeriodBeginDt' : '08/28/2017',
'ctl00$MainContent$txtPeriodEndingDt' : '11/25/2018',
'ctl00$MainContent$btnSearch': 'Search'
}
for page in range(1, 12):
with requests.Session() as s:
s.headers["User-Agent"] = "Mozilla/5.0"
payload['__EVENTARGUMENT'] = f'Page${page}'
req = s.post(url,data=payload,cookies=res.cookies.get_dict())
sauce = BeautifulSoup(req.text, "lxml")
for items in sauce.select("#MainContent_gvBRCSummary tr"):
data = [item.get_text(strip=True) for item in items.select("th,td")]
print(data)
Upvotes: 1