Scraping AJAX loaded content with python?

Question

So i have function that is called when i click a button , it goes as below

var min_news_id = "68feb985-1d08-4f5d-8855-cb35ae6c3e93-1";
function loadMoreNews(){
  $("#load-more-btn").hide();
  $("#load-more-gif").show();
  $.post("/en/ajax/more_news",{'category':'','news_offset':min_news_id},function(data){
      data = JSON.parse(data);
      min_news_id = data.min_news_id||min_news_id;
      $(".card-stack").append(data.html);
  })
  .fail(function(){alert("Error : unable to load more news");})
  .always(function(){$("#load-more-btn").show();$("#load-more-gif").hide();});
}
jQuery.scrollDepth();

Now i don't have much experience with javascript , but i assume its returning some json data from some sort of api at "en/ajax/more_news" .

Is there i way could directly call this api and get the json data from my python script. If Yes,how?

If not how do i scrape the content that is being generated?

Padraic Cunningham · Accepted Answer

You need to post the news id that you see inside the script to https://www.inshorts.com/en/ajax/more_news, this is an example using requests:

from bs4 import BeautifulSoup
import requests
import re

# pattern to extract min_news_id
patt = re.compile('var min_news_id\s+=\s+"(.*?)"')

with requests.Session() as s:
    soup = BeautifulSoup(s.get("https://www.inshorts.com/en/read").content)
    new_id_scr = soup.find("script", text=re.compile("var\s+min_news_id"))
    print(new_id_scr.text)
    news_id = patt.search(new_id_scr.text).group()
    js = s.post("https://www.inshorts.com/en/ajax/more_news", data={"news_offset":news_id})
    print(js.json())

js gives you all the html, you just have to access the js["html"].

Scraping AJAX loaded content with python?

Answers (2)

Related Questions