Effective CJP
Effective CJP

Reputation: 15

Why does my request return an empty list?

When I use XPath to crawl and parse the content of Tencent commonweal, all the returned lists are empty. The following below is my code(The information of headers is hidden).And the target url is https://gongyi.qq.com/succor/project_list.htm#s_tid=75.I would appreciate it if someone could help me solve this problem.

import requests
import os
from lxml import etree

if __name__ =='__main__':

    url = 'https://gongyi.qq.com/succor/project_list.htm#s_tid=75'
    headers = {
        'User-Agent': XXX    }
    response = requests.get(url=url,headers=headers)
    page_text = response.text
    tree = etree.HTML(page_text)
    li_list = tree.xpath('//div[@class="pro_main"]//li')
    for li in li_list:
        title = li.xpath('./div[2]/div/a/text()')[0]
        print(title)

Upvotes: 1

Views: 977

Answers (1)

BiOS
BiOS

Reputation: 2304

So what is actually happening here is that you can only access the first ul inside the pro_main div, because all those li items and their parent are populated by JavaScript, thus your list won't be there by the time you scrape the html with requests.get(), it will be empty!

The good news is that the JS script in questions populates the data using an API and just exactly how the website does it, you may as well retrieve those titles using the actual API and print them.

import requests, json
import os

if __name__ =='__main__':

    url = 'https://ssl.gongyi.qq.com/cgi-bin/WXSearchCGI?ptype=stat&s_status=1&s_tid=75'
    resp = requests.get(url).text
    resp = resp[1:-1] #Result is wrapped in (), so we get rid of those
    jj = json.loads(resp)
    
    for i in jj["plist"]:
        title = i["title"]
        print(title)

You can explore the API by printing jj to see if there's more info that you may need later!

Let me know if it works for you!

Upvotes: 1

Related Questions