where to define item custom item loaders in scrapy?

I'm starting to work with item loaders in scrapy,and the basic functionality is working fine as in:

l.add_xpath('course_title', '//*[@class="course-header-ng__main-info__name__title"]//text()')

But if I want to apply a funtion to this item, where do I define the function?

On this question there is an example:

from scrapy.loader.processors import Compose, MapCompose, Join, TakeFirst
clean_text = Compose(MapCompose(lambda v: v.strip()), Join())   
to_int = Compose(TakeFirst(), int)

class MyItemLoader(ItemLoader):
    default_item_class = MyItem
    full_name_out = clean_text
    bio_out = clean_text
    age_out = to_int
    weight_out = to_int
    height_out = to_int

Does this goes instead of the custom template?:

import scrapy


class MoocsItem(scrapy.Item):
    # define the fields for your item here like:
    description = scrapy.Field()
    course_title = scrapy.Field()

Can I use funtions that are one liners as?

clean_text = Compose(MapCompose(lambda v: v.strip()), Join())

Upvotes: 1

Views: 972

Answers (1)

Tarun Lalwani
Tarun Lalwani

Reputation: 146560

There are two ways to use this.

Approach 1

You can change your Item class like below

class MoocsItem(scrapy.Item):
    # define the fields for your item here like:
    description = scrapy.Field()
    course_title = scrapy.Field(output_processor=clean_text)

And then you will use it like below

from scrapy.loader import ItemLoader
l = ItemLoader(item=MoocsItem(), response=response)
l.add_xpath('course_title', '//*[@class="course-header-ng__main-info__name__title"]//text()')

item = l.load_item()

This would of course be in a callback.

Approach 2

Another way to use it to create your own loader

class MoocsItemLoader(ItemLoader):
    default_item_class = MoocsItem
    course_title_name_out = clean_text

And then you will need to use loader in a callback like below

from scrapy.loader import ItemLoader
l = MoocsItemLoader(response=response)
l.add_xpath('course_title', '//*[@class="course-header-ng__main-info__name__title"]//text()')

item = l.load_item()

As you can see in this approach you don't need to pass it the created item

Upvotes: 3

Related Questions