Why does my request return an empty list？

Question

When I use XPath to crawl and parse the content of Tencent commonweal, all the returned lists are empty. The following below is my code(The information of headers is hidden).And the target url is https://gongyi.qq.com/succor/project_list.htm#s_tid=75.I would appreciate it if someone could help me solve this problem.

import requests
import os
from lxml import etree

if __name__ =='__main__':

    url = 'https://gongyi.qq.com/succor/project_list.htm#s_tid=75'
    headers = {
        'User-Agent': XXX    }
    response = requests.get(url=url,headers=headers)
    page_text = response.text
    tree = etree.HTML(page_text)
    li_list = tree.xpath('//div[@class="pro_main"]//li')
    for li in li_list:
        title = li.xpath('./div[2]/div/a/text()')[0]
        print(title)

BiOS · Accepted Answer

So what is actually happening here is that you can only access the first ul inside the pro_main div, because all those li items and their parent are populated by JavaScript, thus your list won't be there by the time you scrape the html with requests.get(), it will be empty!

The good news is that the JS script in questions populates the data using an API and just exactly how the website does it, you may as well retrieve those titles using the actual API and print them.

import requests, json
import os

if __name__ =='__main__':

    url = 'https://ssl.gongyi.qq.com/cgi-bin/WXSearchCGI?ptype=stat&s_status=1&s_tid=75'
    resp = requests.get(url).text
    resp = resp[1:-1] #Result is wrapped in (), so we get rid of those
    jj = json.loads(resp)
    
    for i in jj["plist"]:
        title = i["title"]
        print(title)

You can explore the API by printing jj to see if there's more info that you may need later!

Let me know if it works for you!

Why does my request return an empty list？

Answers (1)

Related Questions