Python: Scrapy CSV exports incorrectly?

Question

I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.

CSV image attached: csv file

and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv

ahmed · Accepted Answer

Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

I didn't test it, but it will give you an idea.

Python: Scrapy CSV exports incorrectly?

Answers (2)

Related Questions