scrapy-based crawler not extracting content within
tags

Question

I have a custom crawler that scrapes news articles. For the most part it works, however, when adding new urls, it's sometimes hard to figure out what css selectors to use to get the content I want. Below is the code of what i'm working on.

# -*- coding: utf-8 -*-
""" Script to crawl Article from shttps://mycbs4.com
"""
try:
    from crawler import BaseCrawler
except:
    from __init__ import BaseCrawler


class Cmycbs4Crawler(BaseCrawler):
    start_urls = [
        'https://mycbs4.com/search?find=cannabis',
        'https://mycbs4.com/search?find=marijuana',
        'https://mycbs4.com/search?find=cbd',
        'https://mycbs4.com/search?find=thc',
        'https://mycbs4.com/search?find=hemp'
    ]

    source_id = 'mycbs4'

    config_selectors = {
        # Css selector on articles page (the page list many articles)
        'POST_URLS': '.sd-main a::attr(href)',
        #'NEXT_PAGE_URL': '.pager-next > a::attr(href)', # default

        # Css selector on article's detail page (the page display full content of article)
        'ARTICLE_CONTENT': '#js-Story-Content-0 > p',
    }

if __name__ == "__main__":
    crawler = Cmycbs4Crawler()
    crawler.run()

The crawler should crawl the urls and populate everything back into a DB. It scrapes everything except the content.

I've tried the follow selectors

'#js-Story-Content-0 > p', .StoryText_storyText__1uZ3 > p' #js-Story-Content-0 .StoryText_storyText__1uZ3 > p

None of them leads to scraped content from the article. So, i'm not sure what i'm doing wrong.

Below is a screenshot of the content/p tags i'm trying to scrape

Any help would be greatly appreciated

scrapy-based crawler not extracting content within <p> tags

Answers (1)

Related Questions

scrapy-based crawler not extracting content within &lt;p&gt; tags

Answers (1)

Related Questions

scrapy-based crawler not extracting content within <p> tags