Reputation: 41
I am practice my web crawling to get text from website, but I have problem with my 'headers = headers'. when I am run .py, it returns like this:
AttributeError: 'set' object has no attribute 'items'
my code is as below:
import requests
import time
import re
headers = {'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
f = open('/Users/pgao/Desktop/doupo.rtf','a+')
def get_info(url):
res = requests.get(url, headers = headers)
if res.status_code == 200:
contents = re.findall('<p>(.*?)</p>', res.content.decode('utf-8'),re.S)
for content in contents:
f.write(content+'\n')
else:
pass
if __name__ == '__main__':
urls = ['http://www.doupoxs.com/doupocangqiong/{}.html'.format(str(i)) for i in range(2,10)]
for url in urls:
get_info(url)
time.sleep(1)
f.close()
I am struggle with the reason to use 'headers = headers' since some time when web scraping there is no need of it, but sometime it need. and the result where I googled is not that helpful.
Upvotes: 0
Views: 80
Reputation: 4826
From docs, headers
for requests.get()
must be a dict
.
If you’d like to add HTTP headers to a request, simply pass in a
dict
to the headers parameter.
You have passed a set
. Sets do not have any items()
method. That is why you are getting this AttributeError
.
headers = {'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
print(type(headers))
# <class 'set'>
Add a key to your headers
variable.
headers = {'User-Agent': 'Mozilla/5.0 .....'}
Edit: Updated key value for "User-Agent" header.
Upvotes: 0
Reputation: 77407
The header needs to be a dict
but you created a set
. The syntax is similar, but notice how the following has a key:value pair
header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
Upvotes: 0