Reputation: 11
I am trying to parse one webpage by using Python 2.7 and I want to read entire HTML code. But result is like this ...
<html><head><script type="text/javascript">
location.replace( "http://captcha.search.daum.net/captcha/show?url=http%3A%2F%2Fsearch.daum.net%2Fsearch%3Fw%3Dnews%26nil_search%3Dbtn%26DA%3DNTB%26enc%3Dutf8%26cluster%3Dy%26cluster_page%3D1%26q%3D%25EB%25B3%25B4%25EA%25B3%25A0%25EC%2584%259C" );
</script>
</head></html>
I think this webpage is using JavaScript. How can I parse entire HTML code contained in JavaScript?
My python code is this ...
#-*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
url = "http://search.daum.net/search?w=news&nil_search=btn&DA=NTB&enc=utf8&cluster=y&cluster_page=1&q=%EB%B3%B4%EA%B3%A0%EC%84%9C"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
print soup
Upvotes: 1
Views: 144
Reputation: 923
It seems some headers are required for this page to be shown properly.
Try adding page headers from your request to your soup command, sending the same parameters as your browser send to get the result u see in the browser
Upvotes: 1