How do I run Headless Chrome in a Shell on Google Cloud Platform

Question

I have read a little bit about Headless-Chrome and the Puppeteer API that Google has developed. I have seen a few answers on Stack Overflow so far about running Headless Chrome, and I also know all about Selenium for Testing Web-Pages and Scraping Web-Pages. I have written an HTML Parser, Search and Update Package myself, but I often run into problems when there is Java-Script on a web-page that has data I am trying to parse and retrieve.

According to Google's Documentation, Headless Chrome has been supported on Google Cloud Platform Shell (A Linux/Debian/BSD Type of UNIX Command Line, similar to Amazon Web Services). Today, I attempted to download a web-page using a simple Headless Chrome command line, but the Shell returned an error to me as follows:

@cloudshell:~$ chrome --headless --disable-gpu --dump-dom https://sepehr.irib.ir/?idc=32&idt=tv&idv=1

I typed this in an instance of the BASH Shell on GCP, and received this error.

[1] 498
[2] 499
bash: chrome: command not found
[2]+  Done                    idt=tv

The URL above is just a URL from this Stack Overflow question. I was just toying around to see if I could answer it. It is a very commonly asked type of "Web Scraping" question I read on the Web-Scraping tag. It's not too important (not to me, but probably to the OP it might be!) According to a few YouTube Videos, the Google Chrome Headless JSON API allows users to start an instance of Chrome such that it functions like a PaaS, not a UI that can be viewed. This seems pretty nice, and I am fully aware that Selenium Web-Scraping Technology has already taken advantage of this service. HOWEVER, I would just like to start accessing the JSON API from Java - without using Selenium - primarily to see if I can understand it, and to, hopefully, begin making JSON requests (in Java) to a Headless Chrome from a Google Cloud Shell instance without adding all the complexity of the Java Selenium Package.

This Stack Overflow question (and answers) seems to be a "partial duplicate" of my question, unfortunately the Google Help Pages state that since 2019 the service has become fully supported - and the answers here are from 2018. I suspect I should not have to perform a COMPLETE BUILD of Chrome in order to run a headless Chrome instance from the Command Line, but I could be wrong. In any case, newer answers to reflect 2019 and 2020 work done by Google Devs would help - and, more importantly, I would like to use "Plain Old Java Objects" to query the Browser, rather than using Pupeteer and Node.JS. I can deal with JSON very well in Java.

Is there a BASH 'sudo' command that I may use to get an instance of Chrome running in the Shell of GCP?

I have reviewed the suggested duplicates of this question, and do not know what to do... :)

How do I run Headless Chrome in a Shell on Google Cloud Platform

Answers (1)

Related Questions