M. Mayouf
M. Mayouf

Reputation: 41

Multiple Items into one pipeline -- NEO4J database with scrapy use case

I use scrapy in order to scrape a social network and then get the data in a NEO4J database.

My challenge here is to relate 2 items each other:

class person(scrapy.Item):
name=Field()

class AnotherPerson(scrapy.Item):
name=Field()

I want to save those two items in my graph database by saying:

Person has relationship with AnotherPerson()

What I need here is to send two items in ONE pipeline !! How can we do this ? I tried to send it through a list, but scrapy doesn't accept the list as soon as a collection is in there.

Here is my pseudo code:

  1. I get a list of person (each person has profile and a list of firends like facebook)

  2. For each person in this list:

    • I open his profile (through a request and send the response to a callback)
    • I take the response and create a item: Person() and fill it
    • I send the item with a "yield"
    • Then I open his list of friend (through a request and send the response to a another callback)
    • I have the friend list page
    • Then For each friend in this list (the page display a name and a city):
    • create an item: AnotherPerson()
    • I fill this item with the name and the city
    • I send the item with a "yield"

I have two pipelines. They work well to save the data in database, but I don't have any clue to how I can relate them because for that I need to do that in the same process (ie. pipeline).

I'm not sure if I've been clear, so don't hesitate to ask for clarifications.

Upvotes: 0

Views: 539

Answers (1)

eLRuLL
eLRuLL

Reputation: 18799

what about adding the Person item as a field for the AnotherPerson?. Remember you could always use the meta parameter on the requests to pass information between callbacks.

You could do something like:

parse_person(self, response):
    ...
    yield Request(self, url=someurl, callback=parse_anotherperson, 
                    meta={"some_key":"some_person_id"})

Then you could add a reference to the previous Person on your AnotherPerson item as a field or something else.

Upvotes: 2

Related Questions