Max Frai
Max Frai

Reputation: 64266

Working with html generated from javascript

I have some html-page. There is a javascript which generates some content. I have to parse this content from python-script. I have saved copy of file on the computer. Are there any ways to work with 'already generated' html? Like I can see in the browser after opening page-file. As I understand, I have to work with DOM (maybe, xml2dom lib).

Upvotes: 0

Views: 294

Answers (2)

Alex Martelli
Alex Martelli

Reputation: 881477

Have you saved "the file" (web page, I imagine) before or after Javascript has altered it?

If "after", then it doesn't matter any more that some of the HTML was done via Javascript -- you can just use popular parsers like lxml or BeautifulSoup to handle the HTML you have.

If "before", then first you need to let Javascript do its work by automating a real browser; for that task, I would recommend SeleniumRC -- which brings you back to the "after" case;-).

Upvotes: 2

Pekka
Pekka

Reputation: 449385

I think you may have a fundamental misunderstanding in regards to what runs where: At the time JavaScript generates the content (on client side), the server side processing of the document has already taken place. There is no direct way for a server side Python script to access HTML created by JavaScript. Basically, that HTML lives only "virtually" in the browser's DOM.

You would have to find a way to transmit that HTML to your Python script. Most likely using Ajax. You would take the HTML, and add it as a parameter to your Ajax call (Remember to use POST as the request method so you don't get size limitation problems.)

An example using jQuery's AJAX functions:

$.ajax({ 
  url: "myscript.py", 
  type: "POST",
  data: { html: your_html_content_here },
  success: function(){
    alert("sent HTML to python script!");
  }});

Upvotes: 0

Related Questions