Reputation: 33
I'm relatively new to scraping and wanted to try this as a learning experience. My end goal is to be able to scrape item stats from a game website https://lucy.allakhazam.com/ and post them via a Discord bot. However I've run into a problem even trying to load the HTML from the site and I'm not sure what the problem is.
request("https://lucy.allakhazam.com/item.html?id=28855", function(error, response, html) {
if(error) {
console.log("Error: " + error);
}
console.log("Status code: " + response.statusCode);
var $ = cheerio.load(html);
console.log(html);
});
The only output from the console is:
<head><meta HTTP-EQUIV="Refresh" CONTENT="0; URL=/index.html?setcookie=1"></head>
I've tried experimenting with other sites and I'm able to get the raw html from them, but not this one and I'm not sure why. Any help is appreciated thank you!
Upvotes: 2
Views: 196
Reputation: 57425
I'd use a promise-based request library like fetch (native since Node 18), node-fetch or axios. One option is to hardcode in the redirect URL:
const cheerio = require("cheerio"); // 1.0.0-rc.12
const url = "https://lucy.allakhazam.com/item.html?id=28855&setcookie=1";
fetch(url)
.then(res => {
if (!res.ok) {
throw Error(res.statusText);
}
return res.text();
})
.then(html => {
const $ = cheerio.load(html);
const text = $(".shotdata")
.contents()
.get()
.map(e => $(e).text().trim())
.filter(e => e);
console.log(text);
});
If you need to handle a dynamic redirect, you could parse the redirected URL and perform a second request:
const cheerio = require("cheerio"); // 1.0.0-rc.12
const get = url =>
fetch(url).then(res => {
if (!res.ok) {
throw Error(res.statusText);
}
return res.text();
});
const url = "https://lucy.allakhazam.com/item.html?id=28855";
get(url)
.then(html => {
const $ = cheerio.load(html);
const redirect = $('meta[http-equiv="Refresh"]')
.attr("content")
.split("/")
.at(-1);
return get(`${new URL(url).origin}/${redirect}`);
})
.then(html => {
const $ = cheerio.load(html);
const text = $(".shotdata")
.contents()
.get()
.map((e) => $(e).text().trim())
.filter((e) => e);
console.log(text);
});
Upvotes: 2