Suk-ju Lee
Suk-ju Lee

Reputation: 11

How to parse JavaScript in webpage?

I am trying to parse one webpage by using Python 2.7 and I want to read entire HTML code. But result is like this ...

<html><head><script type="text/javascript">
location.replace( "http://captcha.search.daum.net/captcha/show?url=http%3A%2F%2Fsearch.daum.net%2Fsearch%3Fw%3Dnews%26nil_search%3Dbtn%26DA%3DNTB%26enc%3Dutf8%26cluster%3Dy%26cluster_page%3D1%26q%3D%25EB%25B3%25B4%25EA%25B3%25A0%25EC%2584%259C" );
</script>
</head></html>

I think this webpage is using JavaScript. How can I parse entire HTML code contained in JavaScript?

My python code is this ...

#-*- coding: utf-8 -*-

import urllib2
from bs4 import BeautifulSoup

url = "http://search.daum.net/search?w=news&nil_search=btn&DA=NTB&enc=utf8&cluster=y&cluster_page=1&q=%EB%B3%B4%EA%B3%A0%EC%84%9C"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

print soup

Upvotes: 1

Views: 144

Answers (1)

alizelzele
alizelzele

Reputation: 923

It seems some headers are required for this page to be shown properly.

Try adding page headers from your request to your soup command, sending the same parameters as your browser send to get the result u see in the browser

Upvotes: 1

Related Questions