Reputation: 137
I've created a script in python using post requests to fetch the name of different suppliers from a webpage but unfortunately I'm getting this error AttributeError: 'NoneType' object has no attribute 'text'
whereas it occured to me that I did things in the right way.
To populate the content, it is required to click on the search button just the way it is seen in the image.
I've tried so far:
import requests
from bs4 import BeautifulSoup
url = "https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml"
r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
payload = {
'contentForm': 'contentForm',
'contentForm:j_idt225_listButton2_HIDDEN-INPUT': '',
'contentForm:j_idt161_inputText': '',
'contentForm:j_idt164_SEARCH': '',
'contentForm:j_idt167_selectManyMenu_SEARCH-INPUT': '',
'contentForm:j_idt167_selectManyMenu-HIDDEN-INPUT': '',
'contentForm:j_idt167_selectManyMenu-HIDDEN-ACTION-INPUT': '',
'contentForm:search': 'Search',
'contentForm:j_idt185_select': 'SUPPLIER_NAME',
'javax.faces.ViewState': soup.select_one('[id="javax.faces.ViewState"]')['value']
}
res = requests.post(url,data=payload,headers={
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
})
sauce = BeautifulSoup(res.text,"lxml")
item = sauce.select_one(".form2_ROW").text
print(item)
Only this portion will do as well:
8121 results found.
Full traceback:
Traceback (most recent call last):
File "C:\Users\WCS\AppData\Local\Programs\Python\Python37-32\general_demo.py", line 27, in <module>
item = sauce.select_one(".form2_ROW").text
AttributeError: 'NoneType' object has no attribute 'text'
Upvotes: 0
Views: 90
Reputation: 84465
You need to find a way to get the cookie. The following currently works for me across multiple requests.
import requests
from bs4 import BeautifulSoup
url = "https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml"
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0',
'Referer' : 'https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding' : 'gzip, deflate, br',
'Accept-Language' : 'en-US,en;q=0.9',
'Cache-Control' : 'max-age=0',
'Connection' : 'keep-alive',
'Cookie' : '__cfduid=d3fe47b7a0a7f3ef307c266817231b5881555951761; wlsessionid=pFpF87sa9OCxQhUzwQ3lXcKzo04j45DP3lIVYylizkFMuIbGi6Ka!1395223647; BIGipServerPTN2_PRD_Pool=52519072.47873.0000'
}
with requests.Session() as s:
r = s.get(url, headers= headers)
soup = BeautifulSoup(r.text,"lxml")
payload = {
'contentForm': 'contentForm',
'contentForm:search': 'Search',
'contentForm:j_idt185_select': 'SUPPLIER_NAME',
'javax.faces.ViewState': soup.select_one('[id="javax.faces.ViewState"]')['value']
}
res = s.post(url,data=payload,headers= headers)
sauce = BeautifulSoup(res.text,"lxml")
item = sauce.select_one(".formOutputText_HIDDEN-LABEL.outputText_TITLE-BLACK").text
print(item)
Upvotes: 2