Reputation: 1075
I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('//td[@class="title"]')
subtext = hxs.select('//td[@class="subtext"]')
items = []
for title in titles:
item = HackernewsItem()
item["title"] = title.select("a/text()").extract()
item["url"] = title.select("a/@href").extract()
items.append(item)
for score in subtext:
item = HackernewsItem()
item["score"] = score.select("span/text()").extract()
items.append(item)
return items
As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.
CSV image attached:
and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv
Upvotes: 2
Views: 227
Reputation: 5600
Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select('//td[@class="title"]')
items = []
for title in titles:
item = HackernewsItem()
item["title"] = title.select("a/text()").extract()
item["url"] = title.select("a/@href").extract()
item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
items.append(item)
return items
I didn't test it, but it will give you an idea.
Upvotes: 2
Reputation: 25341
The CSV module from Python 2.7 does not support Unicode, so it's suggested to use unicodecsv instead.
$pip install unicodecsv
The unicodecsv is a drop-in replacement for Python 2's csv module which supports unicode strings without a hassle.
And then use this instead of import csv
import unicodecsv as csv
Upvotes: 1