Reputation: 13
I have created a list of all page uris I'd like to download an image from for a vehicle service manual.
The images are delivered via a PHP script,as can be seen here http://www.atfinley.com/service/index.php?cat=g2&page=32
This is probably meant to deter behaviors like my own, however, every single Acura Legend owner shouldn't depend on a single host for their vehicle's manual.
I'd like to design a bot in JS/Java that can visit every url I've stored in this txt document https://pastebin.com/yXdMJipq
To automate the download of the available png at the resource.
I'll eventually be creating a pdf of the manual, and publishing it for open and free use.
If anyone has ideas for libraries I could use, or ways to delve into the solution, please let me know. I am most fluent in Java.
I'm thinking a solution might be to fetch the html document at each url, and download the image from the <img src>
argument.
Upvotes: 0
Views: 1725
Reputation: 13
Scanner read;
Writer write;
try {
File list = new File("F:/imgurls.txt");
read = new Scanner(list);
double s = 0;
while(read.hasNextLine())
try {
s++;
String url = read.nextLine();
Response imageResponse = Jsoup.connect(url).ignoreContentType(true).execute();
FileOutputStream writer = new FileOutputStream(new java.io.File("F:/Acura/" + (int) s + ".png"));
writer.write(imageResponse.bodyAsBytes());
writer.close();
System.out.println((double)(s/2690) + "%");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
read.close();
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
Worked for generating pngs
Upvotes: 0
Reputation: 13
Finished solution for grabbing image urls;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.Scanner;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Acura {
public static void main(String[] args) throws IOException {
Scanner read;
Writer write;
try {
File list = new File("F:/result.txt");
read = new Scanner(list);
write = new FileWriter("F:/imgurls.txt");
double s = 0;
while(read.hasNextLine())
try {
s++;
String url = read.nextLine();
Document doc = Jsoup.connect(url).get();
Element img = doc.select("img").first();
String imgUrl = img.absUrl("src");
write.write(imgUrl + "\n");
System.out.println((double)(s/2690) + "%");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
read.close();
write.close();
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
}
Generates a nice long list of image urls in a text document. Could have done it in a non-sequential manner, but was heavily intoxicated when I did this. However I did add a progress bar for my own peace of mind :)
Upvotes: 0
Reputation: 193
I have written something similar but unfortunately, i can't find it anymore. Nevertheless, i remember using the JSoup Java-library which comes in pretty handy.
It includes an HTTP-client and you can run CSS-selectors on the document just like with jQuery...
This is the example from their frontpage:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
Creating PDFs is quite tricky, but i use Apache PDFBox for such things...
Upvotes: 1
Reputation: 1145
I know you asked for a JavaScript solution but I believe PHP (which you also added as a tag) is more suitable for the task. Here are some guidelines to get you started:
If this process proves to be too long and you get a PHP runtime error consider storing the URLs generated by step 2 in a file and then using that file to generate a new array and run step 3 on it as a separate process.
Upvotes: 1