How to read start_urls from csv file in scrapy?

Question

I have two spiders. Let's say A and B. A scrapes bunch of urls and writes it into a csv file and B scrapes inside those urls reading from the csv file generated by A. But it throws FileNotFound error from B before A can actually create the file. How can I make my spiders behave such that B waits until A comes back with url? Any other solution would be helpful.

WriteToCsv.py file

def write_to_csv(item):
    with open('urls.csv', 'a', newline='') as csvfile:
        fieldnames = ['url']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writerow({'url': item})


class WriteToCsv(object):
    def process_item(self, item, spider):
        if item['url']:
            write_to_csv("http://pypi.org" +item["url"])
        return item

Pipelines.py file

ITEM_PIPELINES = {
    'PyPi.WriteToCsv.WriteToCsv': 100,
    'PyPi.pipelines.PypiPipeline': 300,
}

read_csv method

def read_csv():                   
x = []
with open('urls.csv', 'r') as csvFile:
    reader = csv.reader(csvFile)
    for row in reader:
        x = [''.join(url) for url in reader]
return x

start_urls in B spider file

start_urls = read_csv() #Error here

How to read start_urls from csv file in scrapy?

Answers (1)

Related Questions