Reputation: 47
Here is my part of spider:
def parse(self, response):
titles = HtmlXPathSelector(response).select('//li')
for title in titles:
item = EksidefeItem()
item['favori'] = title.select("//*[@id='entry-list']/li/@data-favorite-count").extract()
item['entry'] = ['<a href=https://eksisozluk.com%s'%a for a in title.select("//*[@class='entry-date permalink']/@href").extract()]
item['yazari'] = title.select("//*[@id='entry-list']/li/@data-author").extract()
item['basligi'] = title.select("//*[@id='topic']/h1/@data-title").extract()
item['tarih'] = title.select("//*[@id='entry-list']/li/footer/div[2]/a[1]/text()").extract()
return item
I am getting date and time from item['tarih']
but its not exact date and time it also has another values inside it. Here is an example of parsed data from it:
26.01.2017 20:04 ~ 20:07
I want to use only date part (10 characters from left) as
26.01.2017
How can I do that?
Thanks
Upvotes: 1
Views: 165
Reputation: 9647
Consider using item loaders. You can extend the ItemLoader class and write your own custom item loader like this.
from scrapy.loader import ItemLoader
from scrapy.loader.processors import TakeFirst, MapCompose
def tarih_modifier(value):
return value[:10]
class MyCustomLoader(ItemLoader):
default_output_processor = TakeFirst()
tarih_in = MapCompose(tarih_modifier)
You can also write this class in a separate module. Now in the parse method you can use this loader class.
def parse(self, response):
l = MyCustomLoader(item=EksidefeItem(), response=response)
l.add_xpath('name', "//*[@id='entry-list']/li/footer/div[2]/a[1]/text()")
# add the rest
return l.load_item()
Using loader class will give you much more convenience over customizing values.
Upvotes: 1
Reputation: 5029
You could use string slicing to get just the date:
item['tarih'] = title.select("//*[@id='entry-list']/li/footer/div[2]/a[1]/text()").extract()
item['tarih'][0] = item['tarih'][0][:10]
But I would also add some validation (take a look at datetime.datetime.strptime()
) to make sure you got a valid date.
Upvotes: 0