Josh
Josh

Reputation: 1305

Access Webpage With Credentials and Cookies From Command Line

I am trying to access a proprietary website which provides access to a large database. The database is quite large (many billions of entries). Each entry in the database is a link to a webpage that is essentially a flat file containing the information that I need. I have about 2000 entries from the database and their corresponding webpages in the database. I have two related issues that I am trying to resolve:

  1. How to get wget (or any other similar program) to read cookie data. I downloaded my cookies from google chrome (using: https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg?hl=en) but for some reason the html downloaded by wget still cannot be rendered as a webpage. Similarly, I have not been able to get Google Chrome from the command line to read cookies. These cookies are needed to access the database, since they contain my credentials.
  2. In my context, it would be OK if the webpage was downloaded as a PDF, but I cannot seem to figure out how to download a webpage as a pdf using wget or similar tools. I tried using automate-save-page-as (https://github.com/abiyani/automate-save-page-as) but I continuously get an error of the browser not being in my PATH.

Upvotes: 0

Views: 906

Answers (1)

Josh
Josh

Reputation: 1305

I solved both of these issues:

Problem 1: I switched away from wget, curl and python's requests to simply using the selenium webdriver in python. Using selenium, I did not have to deal with issues such as passing cookies,headers, post and get, since it actually opens a browser. This also has a plus that as I was writing the script to use selenium, I could inspect the page and see what it was doing as it was doing it.

Problem 2: Selenium has a method called page_source, which downloaded the html of the webpage. When I tested it, it rendered the html correctly.

Upvotes: 1

Related Questions