Reputation: 36307
I'm working with scrapy. I want to generate a unique user agent for each request. I have the following:
class ContactSpider(Spider):
name = "contact"
def getAgent(self):
f = open('useragentstrings.txt')
agents = f.readlines()
return random.choice(agents).strip()
headers = {
'user-agent': getAgent(),
'content-type': "application/x-www-form-urlencoded",
'cache-control': "no-cache"
}
def parse(self, response):
open_in_browser(response)
getAgent generates an agent from a list of the form:
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"
However when I run this I get:
File "..spiders\contact_spider.py, line 35, in <module>
class ContactSpider(Spider):
File "..spiders\contact_spider.py", line 54, in ContactSpider
'user-agent': getAgent(),
TypeError: getAgent() takes exactly 1 argument (0 given)
Upvotes: 2
Views: 2021
Reputation: 474003
getAgent()
is an instance method and expects to see the ContactSpider
instance as an argument. But, the problem is, you don't need this function to be a member of your spider class - move it to a separate "helpers"/"utils"/"libs" module and import:
from helpers import getAgent
class ContactSpider(Spider):
name = "contact"
headers = {
'user-agent': getAgent(),
'content-type': "application/x-www-form-urlencoded",
'cache-control': "no-cache"
}
def parse(self, response):
open_in_browser(response)
See also: Difference between Class and Instance methods.
Or, as an alternative approach, there is a scrapy-fake-user-agent
Scrapy middleware that would rotate user agents seamlessly and randomly. User Agent strings are supplied by the fake-useragent
module.
Upvotes: 2