Scrapy logs in, but doesnt crawl anything

Question

Introduction

My crawler finally manages to login, but it wont do any scraping and i cant find the reason for it. My consoleoutput shows no error and i did a crawler for following every internal links some weeks ago, so i thought about, i just have to build my crawler almost identical, but yeah, here i am :>

My XPath expressions should be/have to be correct, because i started to "learn" crawling at this domain

I thought i have to choose CrawlSpider instead of scrapy.Spider but if i change my rules to Linkextractor(r'/a-') <-every productlink contains that "a-"-Tag, i get an error, so i tried to go without rules.

My Code

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request, FormRequest
from scrapy.utils.response import open_in_browser
from scrapy.linkextractors import LinkExtractor
from ..items import ScrapyloginItem

class TopartLoginIn(CrawlSpider):
    name = "test123"
    allowed_domains = ['topart-online.com']
    login_page = 'https://www.topart-online.com/de/Login/p-Login'
    start_urls = ['https://www.topart-online.com/']
    
    rules = (
        Rule(
            LinkExtractor(),
            callback='parse_page',
            follow=True
        ),
    )
    
    def start_requests(self):
        yield Request(
            url=self.login_page,
            callback=self.login,
            dont_filter=True
        )
        
    def login(self, response):
        return FormRequest.from_response(
            response,
            formdata={
                'ff_4d4d375f4c6f67696e5f55736572' : 'not real',
                'ff_4d4d375f4c6f67696e5f50617373' : 'login data',
                'ff_4d4d375f4c6f67696e' : ""                
            },
            callback=self.parse_page)
        
    def after_loging(self, response):
        open_in_browser(response)
        accview = response.xpath('//div[@class="myaccounticons row text-center"]')
        
        if accview:
            print('success')
            
        else:
            print(':(')
            
            for url in self.start_urls:
                yield Request(url=url, callback=self.parse_page)
                        
    def parse_page(self, response):
        productpage = response.xpath('//button[@class="btn btn-primary col-3 js-qty-up"]')

        for a in productpage:
            
            items = ScrapyloginItem()
            items['Title'] = response.xpath('//h1[@class="text-center text-md-left mt-0"]/text()').get()
            yield items

Here you can see, that the loginproccess is succeeded and also it calls, that the page i got referred to after login, i also no productlink. Thats exactly what i want, im just missing something, that it does the actual crawlprocess.

Scrapy logs in, but doesnt crawl anything

Answers (1)

Related Questions