Reputation: 135
I am building a web scraper that has to retrieve quickly the text of a web page, from HTML only. I'm using Python, requests
and BeautifulSoup
.
I would like to detect if the web page content is pure HTML or if it's rendered from Javascript. In this last case, I would just return an error message saying that this cannot be done.
I know about headless browsers to render the Javascript but in this case I really just need to detect it the fastest way possible without having to render it.
It's not really possible to detect script
tag as there are many in every webpage and that doesn't mean the text content is rendered in Javascript necessarily.
Is there something I could check jn the HTML that tells me accurately that the body content will be rendered from Javascript?
Thank you
Upvotes: 1
Views: 1769
Reputation: 828
There is nothing in the initial DOM that shows beforehand that the site is rendered with js. These are some stuff you could try:
Upvotes: 2