Reputation: 23
I'm trying to read out the code of a website. But there is an issue if I want to receive the code of this site for example: "https://www.amazon.de/gp/bestsellers/pet-supplies/#2" I tried a lot, but still im just receiving the code of https://www.amazon.de/gp/bestsellers/pet-supplies". So something does not work right as I want to receive place 21-40 and not 1-20. I'm using an URLConneciton and a BufferedReader:
public String fetchPage(String urlS){
String s = null;
String qc = null;
try{
URL url = new URL(urlS);
URLConnection uc = url.openConnection();
uc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0");
BufferedReader reader = new BufferedReader(new InputStreamReader(uc.getInputStream()));
while((s = reader.readLine()) != null){
qc += s;
}
reader.close();
} catch(IOException e) {
e.printStackTrace();
qc = "receiving qc failed";
}
return qc;
}
Thank you in advance for your effort :)
Upvotes: 0
Views: 26
Reputation: 658
The URL you're fetching, contains an achor (the #2 at the end). An anchor is a client-side concept and is originally used to jump to a certain part of the page. Some webapps (mostly single-page apps) use the anchor to keep track of some sort of state (eg. what page of products you're viewing).
Since the anchor is a client side concept, the responding webserver (or your browser/HTTP client library) just drops any anchors as if you actually requested https://www.amazon.de/gp/bestsellers/pet-supplies
.
Bottom line is that you'll never get the second page... Goog luck in scraping Amazon though ;)
Upvotes: 1