Reputation: 6713
I have a question and can't figure it out for a while
Because the website structure,the data I catch into json file like below:
[{"location": ["(\u5357\u6295)", "(\u53f0\u5357)", "(\u53f0\u5357)"],
"leisuretitle": ["2014", "20140721", "20140726"]}]
But the format I want is:
{"leisurelocation": ["(\u5357\u6295)"], "leisuretitle": ["2014"]},
{"leisurelocation": ["(\u53f0\u5357)"], "leisuretitle": ["20140721"]},
{"leisurelocation": ["(\u53f0\u5357)"], "leisuretitle": ["20140726"]}]
Here is my code:
I dont know how to do it. Can someone please guide me a bit?
def parse(self, response):
sel = Selector(response)
sites = sel.css("div#listabc table ")
for site in sites:
item = LeisureItem()
leisurelocation = site.css(" tr > td.subject > span.city::text ").extract()
leisuretitle = site.css(" tr > td.subject a::text ").extract()
item['leisurelocation'] = leisurelocation
item['leisuretitle'] = leisuretitle
yield item
Upvotes: 0
Views: 73
Reputation: 700
kev's answer is correct for the problem you defined, but I don't think this is the right approach. You should scrape the items one by one.
For example, loop over the table row by row and yield each scraped row as an item:
def parse(self, response):
for city in response.css("div#listabc table>tr"):
item = LeisureItem()
item['leisurelocation'] = city.css("td.subject>span.city::text").extract()
item['leisuretitle'] = city.css("td.subject a::text").extract()
yield item
Upvotes: 0
Reputation: 161954
What you want is generate multiple items from leisurelocation
and leisuretitle
:
leisurelocation = ...
leisuretitle = ...
for i,j in zip(leisurelocation, leisuretitle):
yield LeisureItem(leisurelocation=[i], leisuretitle=[j])
Upvotes: 1