How to scrape a directory full of .html files using scrapy?

Question

I have a folder full of .html files. Is there a way to scrape the data using scrapy?

My attempt:

import scrapy
import os

LOCAL_FOLDER = 'html_files/'
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

class MySpider(scrapy.Spider):
    name = 'mySpider'
    start_urls = [f"file://{BASE_DIR}/{LOCAL_FOLDER}"]

    def parse(self, response):
        rows = response.xpath('//div[@class="data"]//tbody/tr')
        print(rows)

structure:

html_files/
    ├── b.html
    ├── c.html
    ├── d.html
    ├── e.html
    ├── f.html

Any guidance would be much appreciated.

How to scrape a directory full of .html files using scrapy?

Answers (1)

Related Questions