WolfHawk
WolfHawk

Reputation: 11

First Python Scrapy Web Scraper Not Working

I took the Data Camp Web Scraping with Python course and am trying to run the 'capstone' web scraper in my own environment (the course takes place in a special in-browser environment). The code is intended to scrape the titles and descriptions of courses from the Data Camp webpage.

I've spend a good deal of time tinkering here and there, and at this point am hoping that the community can help me out.

The code I am trying to run is:

# Import scrapy
import scrapy

# Import the CrawlerProcess
from scrapy.crawler import CrawlerProcess

# Create the Spider class
class YourSpider(scrapy.Spider):
    name = 'yourspider'

    # start_requests method
    def start_requests(self):
        yield scrapy.Request(url= https://www.datacamp.com, callback = self.parse)

    def parse (self, response):
        # Parser, Maybe this is where my issue lies
        crs_titles = response.xpath('//h4[contains(@class,"block__title")]/text()').extract()
        crs_descrs = response.xpath('//p[contains(@class,"block__description")]/text()').extract()
        for crs_title, crs_descr in zip(crs_titles, crs_descrs):
            dc_dict[crs_title] = crs_descr

# Initialize the dictionary **outside** of the Spider class
dc_dict = dict()

# Run the Spider
process = CrawlerProcess()
process.crawl(YourSpider)
process.start()

# Print a preview of courses
previewCourses(dc_dict)

I get the following output:

C:\Users*\PycharmProjects\TestScrape\venv\Scripts\python.exe C:/Users/*/PycharmProjects/TestScrape/main.py File "C:\Users******\PycharmProjects\TestScrape\main.py", line 20 yield scrapy.Request(url=https://www.datacamp.com, callback=self.parse1) ^ SyntaxError: invalid syntax

Process finished with exit code 1

I notice that the parse method in line 20 remains grey in my PyCharm window. Maybe I am missing something important in the parse method?

Any help in getting the code to run would be greatly appreciated!

Thank you,

-WolfHawk

Upvotes: 1

Views: 284

Answers (1)

elyptikus
elyptikus

Reputation: 1148

The error message is triggered in the following line:

yield scrapy.Request(url=https://www.datacamp.com, callback = self.parse)

As an input to url you should enter a string and strings are written with ' or " in the beginning and in the end.

Try this:

yield scrapy.Request(url='https://www.datacamp.com', callback = self.parse)

If this is your full code, you are also missing the function previewCourses. Check if it is provided to you or write it yourself with something like this:

def previewCourses(dict_to_print):
    for key, value in dict_to_print.items():
        print(key, value)

Upvotes: 1

Related Questions