Reputation: 19
I want to get answers from a online chatbot. http://talkingbox.dyndns.org:49495/braintalk? (the ? belongs to the link)
To send a question you just have to send a simple request:
http://talkingbox.dyndns.org:49495/in?id=3B9054BC032E53EF691A9A1803040F1C&msg=[Here the question]
Source looks like this:
<frameset cols="*,185" frameborder="no" border="0" framespacing="0">
<frameset rows="100,*,82" frameborder="no" border="0" framespacing="0">
<frame src="http://thebot.de/bt_banner.html" marginwidth="0" name="frtop" scrolling="no" marginheight="0" frameborder="no">
<frame src="out?id=3B9054BC032E53EF691A9A1803040F1C" name="frout" marginwidth="0" marginheight="0">
<frameset rows="100%,*" border="0" framespacing="0" frameborder="no">
<frame src="bt_in?id=3B9054BC032E53EF691A9A1803040F1C" name="frin" scrolling="no" marginwidth="0" marginheight="0" noresize>
<frame src="" name="frempty" marginwidth="0" marginheight="0" scrolling="auto" frameborder="no" >
</frameset>
</frameset>
<frameset frameborder="no" border="0" framespacing="0" rows="82,*">
<frame src="stats?" name="fr1" scrolling="no" marginwidth="0" marginheight="0" frameborder="no">
<frame src="http://thebot.de/bt_rechts.html" name="fr2" scrolling="auto" marginwidth="0" marginheight="0" frameborder="no" >
</frameset>
</frameset>
I was using "mechanize" and beautifulsoup for web scraping but I suppose mechanize does not support dynamic webpages.
How can I get the answers in this case?
I am also looking for a solution which work good on Windows and Linux.
Upvotes: 0
Views: 2301
Reputation: 11396
be it BeautifulSoup, mechanize, Requests or even Scrapy, loading that dynamic pages will have to be done by another step written by you.
for example, using scrapy this may look something like:
class TheBotSpider(BaseSpider):
name = 'thebot'
allowed_domains = ['thebot.de', 'talkingbox.dyndns.org']
def __init__(self, *a, **kw):
super(TheBotSpider, self).__init__(*a, **kw)
self.domain = 'http://talkingbox.dyndns.org:49495/'
self.start_urls = [self.domain +
'in?id=3B9054BC032E53EF691A9A1803040F1C&msg=' +
self.question]
def parse(self, response):
sel = Selector(response)
url = sel.xpath('//frame[@name="frout"]/@src').extract()[0]
yield Request(url=url, callback=dynamic_page)
def dynamic_page(self, response):
.... xpath to scrape answer
run it with a question as argument:
scrapy crawl thebot -a question=[Here the question]
for more details on how to use scrapy see scrapy tutorial
Upvotes: 1
Reputation: 19388
I would use Requests for task like this.
import requests
r = requests.get("http://talkingbox.dyndns.org:49495/in?id=3B9054BC032E53EF691A9A1803040F1C&msg=" + your_question)
For webpages that do not contain dynamic content, r.text
is what you want.
Since you didn't provide more information about dynamic webpage, there is not much more to say.
Upvotes: 0