Using a Loop to enter values in "start_urls" function to input values from a csv

Question

I basically have a list of titles to search on a website which are stored in a csv.

I'm extracting those values and then trying to add append them to the search link in the start_urls function.

However, when I run the script, it only takes the last value of the list. Is there any particular reason why this happens?

class MySpider(CrawlSpider):
      name = "test"
      allowed_domains = ["example.com"]
      import pandas as pd
      df = pd.read_csv('test.csv')
      saved_column = df.ProductName
      for a in saved_column:
        start_urls = ["http://www.example.com/search?noOfResults=20&keyword="+str(a)"]

      def parse(self,response):

aberna · Accepted Answer

There is a conceptual error in your code. You are making the loop but without any action other than rotating the urls. So the parse function is called with the last value of the loop.

A possible other approach would be to override 'start_requests' method of the spider:

def start_requests(self):
    df = pd.read_csv('test.csv')
    saved_column = df.ProductName
    for url in saved_column:
        yield Request(url, self.parse)

Idea got from here: How to generate the start_urls dynamically in crawling?

Using a Loop to enter values in "start_urls" function to input values from a csv

Answers (1)

Related Questions

Using a Loop to enter values in &quot;start_urls&quot; function to input values from a csv

Answers (1)

Related Questions

Using a Loop to enter values in "start_urls" function to input values from a csv