robots.txt
robots.txt

Reputation: 137

Can't fetch the name of different suppliers from a webpage

I've created a script in python using post requests to fetch the name of different suppliers from a webpage but unfortunately I'm getting this error AttributeError: 'NoneType' object has no attribute 'text' whereas it occured to me that I did things in the right way.

websitelink

To populate the content, it is required to click on the search button just the way it is seen in the image.

enter image description here

I've tried so far:

import requests
from bs4 import BeautifulSoup

url = "https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml"

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")

payload = {
    'contentForm': 'contentForm',
    'contentForm:j_idt225_listButton2_HIDDEN-INPUT': '',
    'contentForm:j_idt161_inputText': '',
    'contentForm:j_idt164_SEARCH': '',
    'contentForm:j_idt167_selectManyMenu_SEARCH-INPUT': '',
    'contentForm:j_idt167_selectManyMenu-HIDDEN-INPUT': '',
    'contentForm:j_idt167_selectManyMenu-HIDDEN-ACTION-INPUT': '',
    'contentForm:search': 'Search',
    'contentForm:j_idt185_select': 'SUPPLIER_NAME',
    'javax.faces.ViewState': soup.select_one('[id="javax.faces.ViewState"]')['value']
}

res = requests.post(url,data=payload,headers={
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
    })
sauce = BeautifulSoup(res.text,"lxml")
item = sauce.select_one(".form2_ROW").text
print(item)

Only this portion will do as well: 8121 results found.

Full traceback:

Traceback (most recent call last):
  File "C:\Users\WCS\AppData\Local\Programs\Python\Python37-32\general_demo.py", line 27, in <module>
    item = sauce.select_one(".form2_ROW").text
AttributeError: 'NoneType' object has no attribute 'text'

Upvotes: 0

Views: 90

Answers (1)

QHarr
QHarr

Reputation: 84465

You need to find a way to get the cookie. The following currently works for me across multiple requests.

import requests
from bs4 import BeautifulSoup

url = "https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml"

headers = {
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0',
    'Referer' : 'https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding' : 'gzip, deflate, br',
    'Accept-Language' : 'en-US,en;q=0.9',
    'Cache-Control' : 'max-age=0',
    'Connection' : 'keep-alive',
    'Cookie' : '__cfduid=d3fe47b7a0a7f3ef307c266817231b5881555951761; wlsessionid=pFpF87sa9OCxQhUzwQ3lXcKzo04j45DP3lIVYylizkFMuIbGi6Ka!1395223647; BIGipServerPTN2_PRD_Pool=52519072.47873.0000'
}

with requests.Session() as s:
    r = s.get(url, headers= headers)
    soup = BeautifulSoup(r.text,"lxml")
    payload = {
        'contentForm': 'contentForm',
        'contentForm:search': 'Search',
        'contentForm:j_idt185_select': 'SUPPLIER_NAME',
        'javax.faces.ViewState': soup.select_one('[id="javax.faces.ViewState"]')['value']
    }
    res = s.post(url,data=payload,headers= headers)
    sauce = BeautifulSoup(res.text,"lxml")
    item = sauce.select_one(".formOutputText_HIDDEN-LABEL.outputText_TITLE-BLACK").text
    print(item)

Upvotes: 2

Related Questions