Scrapy Not Finding div.title

Question

import scrapy


class BookSpider(scrapy.Spider):
    name = "books"
    start_urls = [
        'http://books.toscrape.com/catalogue/page-1.html'
    ]

    def parse(self, response):
        page = response.url.split(".")[-1]
        filename = f'BooksHTML-{page}.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log(f'Saved file {filename}')

So I am using this spider to practice webscraping and I am trying to collect all the titles of the books on this page. When I go into terminal and enter

scrapy shell 'http://books.toscrape.com/catalogue/page-1.html'

and then

response.css("div.title").getall()

it only returns an empty list.

[]

Any clarification would be appreciated.

bas · Accepted Answer

Like Tim Roberts has pointed out in the comments, there are no divs with a class of title.

The full title of each book on the page is in the title property of an a tag (anchor tag), where the anchor tag links to a page for that specific book.

You could get the value of the title property for all anchor tags that have a title property like this:

response.css("a::attr(title)").getall()

returns:

['A Light in the Attic', 'Tipping the Velvet', 'Soumission', 'Sharp Objects', 'Sapiens: A Brief History of Humankind', 'The Requiem Red', 'The Dirty Little Secrets of Getting Your Dream Job', 'The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull', 'The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics', 'The Black Maria', 'Starving Hearts (Triangular Trade Trilogy, #1)', "Shakespeare's Sonnets", 'Set Me Free', "Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)", 'Rip it Up and Start Again', 'Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991', 'Olio', 'Mesaerion: The Best Science Fiction Stories 1800-1849', 'Libertarianism for Beginners', "It's Only the Himalayas"]

Scrapy Not Finding div.title

Answers (1)

Related Questions