Asylzat
Asylzat

Reputation: 229

How to run browser functionality on server?

I have a web site and parsing some content from social media or other web sites to keep information up to date. As you know Instagram won't work with "beatifulSoup Python Parser" or maybe others just because you have to log in, and must be able to run js, otherwise it won't load. And it happens with others too.

For that reason i do run some js script on client side, using google chrome console, this script saves the data in localStorage which i do use after.

The problem is sometimes i do have a low speed internet, or i have to run this script with my pc, it takes a lot of time. Maybe not much but doing in every day time, it becomes really problematic.

I wanted to make this job on server side, there is always high internet speed, and i would be able to start the script with my phone, when ever or where ever, and i am trying to figure out how to realize.

I need a browser on server side, which would run js, and just everything the same as a client.

  1. Start a browser
  2. Wait for a page load
  3. run some js script

Most of the servers do not have the graphic interface and do return on terminal just html code.

But there is a lot of host services, java servers and more and more, not only apach or nginx. On java as i know you can create a browser of your own, so it means kinda possible to run browser function on server.

The question is, is there any ready tools which i can just use, or maybe to write this kinda functionality, but using libraries, frameworks to be able to write it as fast as possible, not spending a lot of time. Because I am not working over, to create a browser on server side, i just want to be able to run some js.

I work with php, js, python. Java only SE and on the client. JavaEE has a lot of libraries and tons of abilities. Maybe somebody knows in which language i could realize this. Any frameworks, or maybe on linux it is possible, or there is a host servers which can run as a client, some companies i think does this kind of staff to make some job automatically.

I was thinking about phantomjs or nodejs, but i am new for them, and i guess i would spend a lot of time with no result.

Any advices, links, opinions, ideas would help me a lot, Thanks!!!

enter image description here

Phantomjs does works, but you can't log in with Phantomjs, maybe because it doesn't keeps cookie or session data, or maybe there is no headers, so instead of using web scraper programs just use real browser on server

PhantomJS is really uneasy, inconvenient or whatever, for testing you do not get any information, it takes too long time, you can't run just js script.

page.evaluate(function() {

     setTimeout(function(){ document.getElementById("login").click(); console.log("click initialized"); }, 1000);
});

SetTimeout inside evaluate seems to not work, even check for that takes a lot of time

page.evaluateJavascript(function() {

});

evaluateJavascript gets stuck (it won't run phantom.exit(0)), you have to restart cmd and go to cd and type all comands again.

Very simple manupulations grows into huge problems. I don't know why phantomjs was created, was it only for just capturing or doing a really simple staff. Even parser from phantomjs is awfull. There is no tutorials, none of use. No graphic interface, the script inside evaluate seems like run js in not a proper way, it is really difficult to identify and learn how it is working inside. You don't have any access, any information, whenever it return an empty line or just none. No errors, just anything. Was it popular, i don't know, i heard before about phantomjs, seems like just nothing, ugly, no use.

I was thinking about webDriver for example selenium, there is no need to run selenium on client, running on server seems to be a little bit expensive. There is no hosting which would give you a vds with gui, not ubuntu server, because you need the browser.


I realized that there is no solution


i have a script which works on js, and it has a multi functionality, not only parsing, kinda bot, which analyses user data, follows, unfollows, post the data, and runs throws the users. Question was "How to run browser functionality on server", i just wanted emulate browser by program starting for 100 of accounts on server at once, but this is not possible i guess. Maybe i will close the question with the solution of "There is no solution, you can't run browser on server". WebDriver usage is too expensive. There is no program, for webDrivers you could use at once Ex: Chrome, Firefox, Opera, Yandex and seems that all, but they are also use a lot of RAM which is too expensive on VDS server.

Upvotes: 2

Views: 2924

Answers (2)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15608

You do not have to login to scrape Instagram. Working with JS, I have used requests_html package that does the job for you. You could start with instagram_scraper(https://github.com/meetmangukiya/instagram-scraper) which is inspired by twitter-scraper(https://github.com/kennethreitz/twitter-scraper) by Kenneth Reitz who is the author of both requests and requests_html. The main idea is to scrape without tokens or logins

Both scripts has inspired me to create a scraper that does not require login. It is a good place to start, at least.

Updated 2018-09-22: I followed Setting up a Digital Ocean server for Selenium, Chrome, and Python but on my own server. The trick is creating a fake display: See this from Jonathan, runs on server :)

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=0, size=(800, 600))
display.start()

options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')

driver = webdriver.Chrome(chrome_options=options)
driver.get('http://nytimes.com')
print(driver.title)

Upvotes: 3

Alvin
Alvin

Reputation: 259

I think you can use phantomjs, I have used this to complete many spiders that need login or js load operation.

You can log in using Phantomjs, and it can set cookie, session data and request headers. All you need to do is search the method, like this: phantomjs login Instagram

when you using phantomjs, the most important thing is that everytime you want to do the next step, you must ensure the page or related elements was load finished. Because page or js load need speed time, sometimes you also need send extra args to get the page load, otherwise you can't start the next step without the related elements.

Upvotes: 2

Related Questions