Reputation: 75
I would like to know how I can define a list of RegEx's outside of my Scrapy spider, and then read the RegEx's into a LxmlLinkExtractor.
I'm using the current code:
file = open("myFile.txt")
regexs = [rule.strip() for rule in file.readlines()]
file.close()
return regexs
The returned value is then passed as a parameter as follows:
Rule(LinkExtractor(allow=(regexs, )), callback='parse_file')
This results in the following error:
TypeError: unhashable type: 'list'
Upvotes: 1
Views: 192
Reputation: 1368
This should work:
regexs = [rule.strip() for rule in file.readlines()]
LinkExtractor(allow=regexs, callback='parse_file')
See more here about the allow parameter: http://doc.scrapy.org/en/latest/topics/link-extractors.html#module-scrapy.linkextractors.lxmlhtml
Upvotes: 2