Reputation: 15
I am trying to scrape the results from a Quora search query using ImportXML.
The URL is of this form: https://www.quora.com/search?q=scrape%20Quora&time=year
I've tried using ImportXML, and can't get anything to work. As an example, I inspected the questions, and found they were inside a div with a class name of 'q-text puppeteer_test_question_title'. So I tried to import like this, but I just get #N/A:
importxml("https://www.quora.com/search?q=scrape%20Quora&time=year","//div[@class='q-text puppeteer_test_question_title']")
This is clearly not working: is there a fix or just not possible (and why)? Thank you.
Upvotes: -1
Views: 71
Reputation: 15328
You can try to fetch the first 3 responses this way (quickly written, could be improved)
function myFunction() {
var options = {
'muteHttpExceptions': true,
'followRedirects': false
};
var url = 'https://www.quora.com/search?q=scrape%20Quora&time=year'
var jsonStrings = UrlFetchApp.fetch(url,options).getContentText().split('window.ansFrontendGlobals.data.inlineQueryResults.results["')
jsonStrings.forEach((jsonString,i) => {
if (i > 0) {
console.log(jsonString.split('"] = ')[1].split('\n')[0])
}
})
}
and then parse the complex json inside. However, other answers are transmitted by quora when scrolling down by ajax asynchronous request.
Upvotes: 1
Reputation: 1
Quora (as of now) runs on JavaScript and google sheets import formulae do not support the scrapping of JS elements:
Upvotes: 1