alexanderlukanin13
alexanderlukanin13

Reputation: 4715

Scrapy Splash click button doesn't work

What I'm trying to do

On avito.ru (Russian real estate site), person's phone is hidden until you click on it. I want to collect the phone using Scrapy+Splash.

Example URL: https://www.avito.ru/moskva/kvartiry/2-k_kvartira_84_m_412_et._992361048

screenshot: Phone is hidden

After you click the button, pop-up is displayed and phone is visible.

enter image description here

I'm using Splash execute API with following Lua script:

function main(splash)
    splash:go(splash.args.url)
    splash:wait(10)
    splash:runjs("document.getElementsByClassName('item-phone-button')[0].click()")
    splash:wait(10)
    return splash:png()
end

Problem

The button is not clicked and phone number is not displayed. It's a trivial task, and I have no explanation why it doesn't work.

Click works fine for another field on the same page, if we replace item-phone-button with js-show-stat. So Javascript in general works, and the blue "Display phone" button must be special somehow.

What I've tried

To isolate the problem, I created a repo with minimal example script and a docker-compose file for Splash: https://github.com/alexanderlukanin13/splash-avito-phone

Javascript code is valid, you can verify it using Javascript console in Chrome and Firefox

document.getElementsByClassName('item-phone-button')[0].click()

I've tried it with Splash versions 3.0, 3.1, 3.2, result is the same.

Update

I've also tried:

Upvotes: 5

Views: 8807

Answers (2)

Mikhail Korobov
Mikhail Korobov

Reputation: 22238

The following script works for me:

function main(splash, args)
  splash.private_mode_enabled = false
  assert(splash:go(args.url))
  btn = splash:select_all('.item-phone-button')[2]
  btn:mouse_click()
  btn.style.border = "5px solid black"
  assert(splash:wait(0.5))
  return {
    num = #splash:select_all('.item-phone-button'),
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

There were 2 issues with the original solution:

  1. There are 2 elements with 'item-phone-button' class, and button of interest is the second one. I've checked which element is matched by setting btn.style.border = "5px solid black".
  2. This website requires private mode to be disabled, likely because it uses localStorage. Check http://splash.readthedocs.io/en/stable/faq.html#website-is-not-rendered-correctly for other common suggestions.

Upvotes: 11

Lore
Lore

Reputation: 1908

I don't know how your implementation works, but I suggest to rename main with parse, the default function called by spiders on start.

If this isn't the problem, first thing to do is controlling if you have picked the right element of that class using Javascript with css selector. Maybe it exists another item with item-phone-button class attribute and you are clicking in the wrong place.

If all above is correct, I suggest then two options that worked for me:

  • Using Splash mouse_click and Splash wait (the latter I see you have already used). If it don't work, try double click, by substituting in your code:

    local button = splash:select('item phone-button') 
    button:mouse_click()
    button:mouse_click()
    

  • Using Splash wait_for_resume, that executes javascript code until terminated and then restart LUA. Your code will become simpler too:

    function main(splash)
        splash:go(splash.args.url)
        splash:wait_for_resume("document.getElementsByClassName([[
                      function main(splash) {
                           document.getElementsByClassName('item-phone-button');[0].click()
                           splash.resume();
                      }               
        ]])
        return splash:png()
    end
    

    EDIT: it seems that is good to use dispatchEvent instead of click() like in this example:

    function simulateClick() {
      var event = new MouseEvent('click', {
        view: window,
        bubbles: true,
        cancelable: true
      });
      var cb = document.getElementById('checkbox'); 
      var cancelled = !cb.dispatchEvent(event);
      if (cancelled) {
        // A handler called preventDefault.
        alert("cancelled");
      } else {
        // None of the handlers called preventDefault.
        alert("not cancelled");
      }
    }
    

    Upvotes: 1

  • Related Questions