user2492364
user2492364

Reputation: 6713

scrapy:change the way to output data

I have a question and can't figure it out for a while

Because the website structure,the data I catch into json file like below:

[{"location": ["(\u5357\u6295)", "(\u53f0\u5357)", "(\u53f0\u5357)"], 
"leisuretitle": ["2014", "20140721", "20140726"]}]

But the format I want is:

{"leisurelocation": ["(\u5357\u6295)"], "leisuretitle": ["2014"]},   
{"leisurelocation": ["(\u53f0\u5357)"], "leisuretitle": ["20140721"]},  
{"leisurelocation": ["(\u53f0\u5357)"], "leisuretitle": ["20140726"]}]

Here is my code:

I dont know how to do it. Can someone please guide me a bit?

def parse(self, response):
    sel = Selector(response)
    sites = sel.css("div#listabc table ")
    for site in sites:
        item = LeisureItem()
        leisurelocation = site.css(" tr > td.subject > span.city::text ").extract()
        leisuretitle =  site.css(" tr > td.subject a::text ").extract()

        item['leisurelocation'] = leisurelocation
        item['leisuretitle'] = leisuretitle
        yield item

Upvotes: 0

Views: 73

Answers (2)

Arthur Burkhardt
Arthur Burkhardt

Reputation: 700

kev's answer is correct for the problem you defined, but I don't think this is the right approach. You should scrape the items one by one.

For example, loop over the table row by row and yield each scraped row as an item:

def parse(self, response):
    for city in response.css("div#listabc table>tr"):
        item = LeisureItem()
        item['leisurelocation'] = city.css("td.subject>span.city::text").extract()
        item['leisuretitle'] = city.css("td.subject a::text").extract()
        yield item

Upvotes: 0

kev
kev

Reputation: 161954

What you want is generate multiple items from leisurelocation and leisuretitle:

leisurelocation = ...
leisuretitle =  ...

for i,j in zip(leisurelocation, leisuretitle):
    yield LeisureItem(leisurelocation=[i], leisuretitle=[j])

Upvotes: 1

Related Questions