How to automate download of generated PDFs

Question

Scenario:
We are required to enter data daily into a government database in a European country. We suddenly have a need to retrieve some of that data. But the only format they will allow is by PDFs generated from the data—hundreds of them. We would like to avoid sitting in front of a webbrowser clicking link after link.

The links generated look like

I have almost no experience with Javascript, so I don't know whether I can install a routine as a bookmark to loop through the DOM, find all the links, and call the function. Nor, if that's possible, how to write it.

The ID numbers can't be predicted, so I can't write another page or curl/wget script to do it. (And if I could, it would still fail as mentioned below.)

The 'viajeros' function is simple:

function viajeros(id){
  var idm = document.forms[0].idioma.value;
  window.open("parteViajeros.do?lang="+idm+"&id_fichero=" + id);
}

but feeding that URI to curl or wget fails. Apparently they check either a cookie or REFERER and generate an error.

Besides, with each link putting the PDF in a browser tab instead of in the downloads directory, we would still have to do two clicks (tab and save) hundreds of times.

What should I do instead?

For what it's worth, this is on MacOS 10.13.4. I normally use Safari, but I also have available Opera and Firefox. I could install Chrome, but that's the last resort. No, that's second to last: we also have a (shudder) Windows 10 laptop. THAT'S last resort.

(Note: I looked at the four suggested duplicates that seemed promising, but each either had no answer or instructed the asker to modify the code that generates the PDF.)

grg · Accepted Answer

document.querySelectorAll("img[src=\"img/pdf.png\"]")
    .forEach((el, i) => {
      let id = el.parentElement.href.split("\"")[1];
      let url =
          "parteViajeros.do?lang=" + document.forms[0].idioma.value +
          "&id_fichero=" + id;
      setTimeout(() => {
        downloadURI(url, id);
      }, 1500 * i)
    });

This gets all of the images of the PDF icon, then looks at their parent for the link target. This href has its ID extracted, and passed to a string construction making the path to the file to be downloaded, similar to ‘viajeros’ but without the window.open. This URL is then passed to downloadURI which performs the download.

This uses downloadURI function from another Stack Overflow answer. You can download a URL by setting the download attribute on the link, then clicking it, which is implemented as so. This is only tested in Chrome.

function downloadURI(uri, name) {
  var link = document.createElement("a");
  link.download = name;
  link.href = uri;
  document.body.appendChild(link);
  link.click();
  document.body.removeChild(link);
  delete link;
}

Open the page with the links and open the console. Paste the downloadURI function first, then the code above to download all the links.

How to automate download of generated PDFs

Answers (2)

Related Questions