Peter S
Peter S

Reputation: 897

Python3 JavaScript web scraper

Does Python3 have a JavaScript based scraping library that is not Selenium? I'm trying to scrape https://www.mailinator.com/v2/inbox.jsp?zone=public&query=test, but the inbox is loaded with JavaScript. The reason I don't want to use Selenium is I don't want it to open a window when I run it.

Here is my non-working code:

import requests
from bs4 import BeautifulSoup as soup
INBOX = "https://www.mailinator.com/v2/inbox.jsp?zone=public&query={}"
def check_inbox(name):
    stuff = soup(requests.get(INBOX.format(name)).text,"html.parser")
    print(stuff.find("ul",{"class":"single_mail-body"}))
check_inbox("retep")

Do any such libraries exist?

I couldn't find anything for the Google search python 3 javascript scraper outside of Selenium.

Upvotes: 2

Views: 880

Answers (1)

Loïc
Loïc

Reputation: 11943

You don't need javascript actually, because it's client side, so you can emulate it.

If you inspect the webpage (developer tools > network), you'll see that there is a websocket connection to this :

wss://www.mailinator.com/ws/fetchinbox?zone=public&query=test

Webpage inspection

Now if you implement a websocket client using python, you'll be able to cleanly fetch your mails (see this : https://github.com/aaugustin/websockets/blob/master/example/client.py).

EDIT :

As mentioned by John, augustin's ws client repo is dead. Today I'd use this : https://websockets.readthedocs.io/en/stable/

Upvotes: 1

Related Questions