Reputation: 41
I use scrapy in order to scrape a social network and then get the data in a NEO4J database.
My challenge here is to relate 2 items each other:
class person(scrapy.Item):
name=Field()
class AnotherPerson(scrapy.Item):
name=Field()
I want to save those two items in my graph database by saying:
Person has relationship with AnotherPerson()
What I need here is to send two items in ONE pipeline !! How can we do this ? I tried to send it through a list, but scrapy doesn't accept the list as soon as a collection is in there.
Here is my pseudo code:
I get a list of person (each person has profile and a list of firends like facebook)
For each person in this list:
I have two pipelines. They work well to save the data in database, but I don't have any clue to how I can relate them because for that I need to do that in the same process (ie. pipeline).
I'm not sure if I've been clear, so don't hesitate to ask for clarifications.
Upvotes: 0
Views: 539
Reputation: 18799
what about adding the Person
item as a field for the AnotherPerson
?. Remember you could always use the meta
parameter on the requests to pass information between callbacks.
You could do something like:
parse_person(self, response):
...
yield Request(self, url=someurl, callback=parse_anotherperson,
meta={"some_key":"some_person_id"})
Then you could add a reference to the previous Person
on your AnotherPerson
item as a field or something else.
Upvotes: 2