problem with 'headers = headers' in web crawling

Question

I am practice my web crawling to get text from website, but I have problem with my 'headers = headers'. when I am run .py, it returns like this:

AttributeError: 'set' object has no attribute 'items'

my code is as below:

import requests
import time
import re


headers = {'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

f = open('/Users/pgao/Desktop/doupo.rtf','a+')

def get_info(url):
    res = requests.get(url, headers = headers)
    if res.status_code == 200:
        contents = re.findall('(.*?)', res.content.decode('utf-8'),re.S)
        for content in contents:
            f.write(content+'
')
    else:
        pass

if __name__ == '__main__':
    urls = ['http://www.doupoxs.com/doupocangqiong/{}.html'.format(str(i)) for i in range(2,10)]
    for url in urls:
        get_info(url)
        time.sleep(1)

f.close()

I am struggle with the reason to use 'headers = headers' since some time when web scraping there is no need of it, but sometime it need. and the result where I googled is not that helpful.

tdelaney · Accepted Answer

The header needs to be a dict but you created a set. The syntax is similar, but notice how the following has a key:value pair

header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}

problem with 'headers = headers' in web crawling

Answers (2)

Related Questions

problem with &#39;headers = headers&#39; in web crawling

Answers (2)

Related Questions

problem with 'headers = headers' in web crawling