Scrapy csv output not having multiple rows for each column

Question

I am trying to scrape an event website and I have the attached code to scrape the event name and location. I write the output to a csv file, but then the csv file has all the event names appended to each other in a single line.

For example, suppose I have two events Bruno Mars and Maroon 5, and their locations as San Jose, Santa Clara. The current output is,

event_name event_location

Bruno Mars, Maroon 5 San Jose, Santa Clara

But I was hoping to see,

event_name event_location

Bruno Mars San Jose

Maroon 5 Santa Clara.

Can someone please let me know why this formatting is getting weird for me? I have attached the code here. I then use scrapy crawl event_spider -o output.csv -t csv to run my code.

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from event_test.items import EventItem


class EventSpider(BaseSpider):
    name = "event_spider"
    allowed_domains = ["eventful.com"]
    start_urls = [
         "http://eventful.com/sanjose/events"
    ]

   def parse(self, response):
     hxs = HtmlXPathSelector(response)
     events = hxs.select("/html/body[@id='events']/div[@id='outer-container']/div[@id='mid-container']/div[@id='inner-container']/div[@id='content']/div[@class='cols-2-1']/div[@class='alpha']/div[@id='top-events']/div[@class='section top-events cage-dbl-border cage-bdr-mdgrey']/div[@id='events-scroll']/div[@id='events-scroll-items']/ul[@id='events-scroll-items-list']/li[@class='top-events-item  ']")
     items = []
     for event in events:
        item = EventItem()
        item['event_name'] = event.select("//h2/a/span/text()").extract()
        item['event_locality'] = event.select("//span[@class='locality']/text()").extract()
        items.append(item)
     return items

alecxe · Accepted Answer

I've simplified the code and xpaths in your spider:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from event_test.items import EventItem


class EventSpider(BaseSpider):
    name = "event_spider"
    allowed_domains = ["eventful.com"]
    start_urls = ["http://eventful.com/sanjose/events"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        events = hxs.select("//li[contains(@class, 'top-events-item')]")
        for event in events:
            item = EventItem()
            item['event_name'] = event.select(".//h2/a/span/text()").extract()[0]
            item['event_locality'] = event.select(".//span[@class='locality']/text()").extract()[0]
            yield item

Here's what you'll get in the csv file:

event_name,event_locality
Under the Influence of Music Tour,Mountain View
Bruno Mars,San Jose
John Mayer: Born & Raised Tour 2013,Mountain View
New Kids on the Block with 98 Degrees and ...,San Jose
Amy Grant,San Jose
Styx,Saratoga
Bob Dylan with Wilco,Mountain View
Kenny Chesney with Eli Young Band,Mountain View
Smash Mouth \/ Sugar Ray \/ Gin Blossoms \...,Saratoga
Creedence Clearwater Revisited \/ 38 Special,Saratoga

Hope that helps.

Scrapy csv output not having multiple rows for each column

Answers (1)

Related Questions