Ninja3412
Ninja3412

Reputation: 75

How to use list of RegEx's when defining LxmlLinkExtractor rule

I would like to know how I can define a list of RegEx's outside of my Scrapy spider, and then read the RegEx's into a LxmlLinkExtractor.

I'm using the current code:

file = open("myFile.txt")
regexs = [rule.strip() for rule in file.readlines()]
file.close()
return regexs

The returned value is then passed as a parameter as follows:

Rule(LinkExtractor(allow=(regexs, )), callback='parse_file')

This results in the following error:

TypeError: unhashable type: 'list' 

Upvotes: 1

Views: 192

Answers (1)

advance512
advance512

Reputation: 1368

This should work:

regexs = [rule.strip() for rule in file.readlines()]
LinkExtractor(allow=regexs, callback='parse_file')

See more here about the allow parameter: http://doc.scrapy.org/en/latest/topics/link-extractors.html#module-scrapy.linkextractors.lxmlhtml

Upvotes: 2

Related Questions