Han Zhengzu
Han Zhengzu

Reputation: 3852

The content of the table was hidden when I scraped using beautifulsoup

Here is an example of my situation.

In fact, the website page here has a table in the leftside, and I want to grasp it using Python. The file structure of the original html file was shown as:

enter image description here

The information of some samples existed inside the 'id=companylist'. Therefore, I wrote the code below for reading them:

url = 'http://182.148.109.184/gisnavigation!citysuriverPage.action?regioncode=510300#'
page = requests.get(url, headers={'Referer' : url})
soup = BeautifulSoup(page.text, 'html.parser')
table    = soup.find("tbody", {"id": "companylist"})  

However, the output is just two lines of strings with no useful information.

 [<tbody id="companylist">
 </tbody>]

Anyone knows the proper treatment?

Upvotes: 0

Views: 321

Answers (1)

radzak
radzak

Reputation: 3128

As stated in the comments, the content is rendered by JavaScript running in your browser. You can use Requests-HTML that runs Chromium behind the scenes.

**Code:**

from requests_html import HTMLSession

url = 'http://182.148.109.184/gisnavigation!citysuriverPage.action?regioncode=510300#'
session = HTMLSession()
r = session.get(url)
r.html.render()

table = r.html.find('#companylist')[0]
print(table.text)

Output:

富顺首创水务有限公司
自贡市
污水厂
...
自贡张家坝氯碱化工有限责任...
自贡市
废气

Upvotes: 1

Related Questions