eN_Joy
eN_Joy

Reputation: 873

scrapy djangoitem with Foreign Key

This question was asked here Foreign Keys on Scrapy without an accepted answer, so I am here to re-raise the question with a clearer defined minimum set up:

The django model:

class Article(models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()
    category = models.ForeignKey('categories.Category', null=True, blank=True)

Note how category is defined is irrelevant here, but it does use ForeignKey. So, in django shell, this would work:

c = Article(title="foo", content="bar", category_id=2)
c.save()

The scrapy item:

class BotsItem(DjangoItem):
    django_model = Article

The scrapy pipeline:

class BotsPipeline(object):
    def process_item(self, item, spider):
        item['category_id'] = 2
        item.save()
        return item

With the above code, scrapy complains:

exceptions.KeyError: 'BotsItem does not support field: category_id'

Fair, since category_id is not appeared in django model, from which we get the scrapy item. For the record, if we have the pipeline (assume we have a category foo):

class BotsPipeline(object):
    def process_item(self, item, spider):
        item['category'] = 'foo'
        item.save()
        return item

Now scrapy complains:

exceptions.TypeError: isinstance() arg 2 must be a class, type, or tuple
 of classes and types

So exactly what should we do?

Upvotes: 6

Views: 1675

Answers (1)

eN_Joy
eN_Joy

Reputation: 873

Okay I managed to solve this problem and I am putting here for the records. As hinted by the last exceptions.TypeError, item['category'] expects an instance of Category class, in my case I am using django-categories so in the pipeline just replace with this (assume Category is populated in ORM already):

class BotsPipeline(object):
    def process_item(self, item, spider):
        item['category'] = Category.objects.get(id=2)
        item.save()
        return item

Upvotes: 9

Related Questions