HippyZ
HippyZ

Reputation: 43

How return items from a custom spider middleware

I've created my Custom SpiderMiddleware from OffsiteMiddleware. A simple copy and paste from the original class, maybe it exist a better method.

I would collect the filtered offsite domains. My pipeline works.

But i don't know how return the items to my pipeline.

Thanks for your help.

def process_spider_output(self, response, result, spider):
    items = []
    for x in result:
        if isinstance(x, Request):
            if x.dont_filter or self.should_follow(x, spider):
                yield x
            else:
                domain = urlparse_cached(x).hostname
                if domain and domain not in self.domains_seen[spider]:
                    self.domains_seen[spider].add(domain)
                    # ***My items ===> items.append(OutboundsLinks(url = domain))***
        else:
            yield x

Upvotes: 0

Views: 1313

Answers (1)

akhter wahab
akhter wahab

Reputation: 4085

process_spider_output() must return an iterable of Request or Item objects.

def process_spider_output(self, response, result, spider):
    items = []
    for x in result:
        if isinstance(x, Request):
            if x.dont_filter or self.should_follow(x, spider):
                yield x
            else:
                domain = urlparse_cached(x).hostname
                if domain and domain not in self.domains_seen[spider]:
                    self.domains_seen[spider].add(domain)
                    # create an item here and yield it 
        else:
            yield x

Upvotes: 1

Related Questions