Reputation: 298
The default order in scrapy is alphabet,i have read some post to use OrderedDict to output item in customized order.
I write a spider follow the webpage.
How to get order of fields in Scrapy item
My items.py.
import scrapy
from collections import OrderedDict
class OrderedItem(scrapy.Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs:
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
class StockinfoItem(OrderedItem):
name = scrapy.Field()
phone = scrapy.Field()
address = scrapy.Field()
The simple spider file.
import scrapy
from info.items import InfoItem
class InfoSpider(scrapy.Spider):
name = 'Info'
allowed_domains = ['quotes.money.163.com']
start_urls = [ "http://quotes.money.163.com/f10/gszl_600023.html"]
def parse(self, response):
item = InfoItem()
item["name"] = response.xpath('/html/body/div[2]/div[4]/table/tr[2]/td[2]/text()').extract()
item["phone"] = response.xpath('/html/body/div[2]/div[4]/table/tr[7]/td[4]/text()').extract()
item["address"] = response.xpath('/html/body/div[2]/div[4]/table/tr[2]/td[4]/text()').extract()
item.items()
yield item
The scrapy info when to run the spider.
2019-04-25 13:45:01 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'address': ['浙江省杭州市天目山路152号浙能大厦'],'name': ['浙能电力'],'phone': ['0571-87210223']}
Why i can't get such desired order as below?
{'name': ['浙能电力'],'phone': ['0571-87210223'],'address': ['浙江省杭州市天目山路152号浙能大厦']}
Thank for Gallaecio's advice, to add the following in settings.py.
FEED_EXPORT_FIELDS=['name','phone','address']
Execute the spider and output to csv file.
scrapy crawl info -o info.csv
The field order is in my customized order.
cat info.csv
name,phone,address
浙能电力,0571-87210223,浙江省杭州市天目山路152号浙能大
Look at the scrapy's debug info :
2019-04-26 00:16:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'address': ['浙江省杭州市天目山路152号浙能大厦'],
'name': ['浙能电力'],
'phone': ['0571-87210223']}
How can i make the debug info in customized order?How to get the following debug output?
2019-04-26 00:16:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'name': ['浙能电力'],
'phone': ['0571-87210223'],
'address': ['浙江省杭州市天目山路152号浙能大厦'],}
Upvotes: 5
Views: 2501
Reputation: 3717
Problem is in __repr__
function of Item
. Originally its code is:
def __repr__(self):
return pformat(dict(self))
So even if you convert your item to OrderedDict
and expect fields to be saved in the same order, this function applies dict()
to it and breaks the order.
So, I propose you to overload it in the way you like, for example:
import json
class OrderedItem(scrapy.Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs:
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
def __repr__(self):
return json.dumps(OrderedDict(self), ensure_ascii = False) # it should return some string
And now you can get this output:
2019-04-30 18:56:20 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{"name": ["\u6d59\u80fd\u7535\u529b"], "phone": ["0571-87210223"], "address": ["\u6d59\u6c5f\u7701\u676d\u5dde\u5e02\u5929\u76ee\u5c71\u8def152\u53f7\u6d59\u80fd\u5927\u53a6"]}
Upvotes: 3
Reputation: 298
The whole items.py which can output customized dubug info in cjk apperance is as below.
import scrapy
import json
from collections import OrderedDict
class OrderedItem(scrapy.Item):
def __init__(self, *args, **kwargs):
self._values = OrderedDict()
if args or kwargs:
for k, v in six.iteritems(dict(*args, **kwargs)):
self[k] = v
def __repr__(self):
return json.dumps(OrderedDict(self),ensure_ascii = False)
#ensure_ascii = False ,it make characters show in cjk appearance.
class StockinfoItem(OrderedItem):
name = scrapy.Field()
phone = scrapy.Field()
address = scrapy.Field()
Upvotes: 0
Reputation: 79
In your spider replace item.items()
with self.log(item.items())
, log msg should be list of tuples in order you assigned them in your spider.
Another way is to combine answer you mentioned in your post with this answer
Upvotes: 0
Reputation: 1801
you can define a custom string representation of your item
class InfoItem:
def __repr__(self):
return 'name: {}, phone: {}, address: {}'.format(self['name'], self.['phone'], self.['address'])
Upvotes: 1