Reputation: 256
I am trying to scrap wikipedia page to fetch list of airlines by first scrapping first page and then going to each individual page of airline to get the website url. I have divided the code in two functions. One to scrap main page and get a new url, and second function to scrap another page from the created url to get the website name from that page. I have used request-promise module for getting the html and then cheerio to parse the data.
export async function getAirlinesWebsites(req,res) {
let response = await request(options_mainpage);
console.log(`Data`);
let $ = cheerio.load(response);
console.log('Response got');
$('tr').each((i,e)=>{
let children = '';
console.log('inside function ', i);
if($(e).children('td').children('a').attr('class') !== 'new') {
children = $(e).children('td').children('a').attr('href');
let wiki_url = 'https://en.wikipedia.org' + children;
console.log(`wiki_url = ${wiki_url}`);
let airline_url = getAirlineUrl(wiki_url);
console.log(`airline_url = ${airline_url}`);
}
})
And then the getAirlineUrl() function will parse another page based on the provided url.
async function getAirlineUrl(url){
const wiki_child_options = {
url : url,
headers : headers
}
let child_response = await request(wiki_child_options);
let $ = cheerio.load(child_response);
let answer = $('.infobox.vcard').children('tbody').children('tr').children('td').children('span.url').text();
return answer;
})
However when I console log the answer variable in the parent function, I get a [object Promise] value instead of a String. How do I resolve this issue?
Upvotes: 0
Views: 175
Reputation: 161457
Since your getAirlineUrl
function returns a promise, you need to await
that promise. You can't have await
nested inside of the .each
callback because the callback is not an async function, and if it was it wouldn't work still. The best fix is the avoid using .each
and just use a loop.
export async function getAirlinesWebsites(req,res) {
let response = await request(options_mainpage);
console.log(`Data`);
let $ = cheerio.load(response);
console.log('Response got');
for (const [i, e] of Array.from($('tr')).entries()) {
let children = '';
console.log('inside function ', i);
if($(e).children('td').children('a').attr('class') !== 'new') {
children = $(e).children('td').children('a').attr('href');
let wiki_url = 'https://en.wikipedia.org' + children;
console.log(`wiki_url = ${wiki_url}`);
let airline_url = await getAirlineUrl(wiki_url);
console.log(`airline_url = ${airline_url}`);
}
}
}
Upvotes: 0
Reputation: 331
Async function return promise.In case of that,you need to use then to get resolved response or use await. This should work if other part of your code is ok.
export async function getAirlinesWebsites(req, res) {
let response = await request(options_mainpage);
console.log(`Data`);
let $ = cheerio.load(response);
console.log("Response got");
$("tr").each(async (i, e) => {
let children = "";
console.log("inside function ", i);
if ($(e).children("td").children("a").attr("class") !== "new") {
children = $(e).children("td").children("a").attr("href");
let wiki_url = "https://en.wikipedia.org" + children;
console.log(`wiki_url = ${wiki_url}`);
let airline_url = await getAirlineUrl(wiki_url);
console.log(`airline_url = ${airline_url}`);
}
});
}
Upvotes: 1