Reputation: 811
I´m trying to programm a webcrawler for the message board of an austrian newspaper called derstandard.at. I´m interested in the interactions an would like to do a network analysis of the users. I was able to retrieve everything I wanted but when in comes to change the message boards page it simply doesn't work.
Using firefox I can simply access the pages I want by changing one number in the url for example page 5
when I try to access this out of my python script I alway get page 1.
First I thought this was because of my user agent but I changed it to my firefox user agent and still get allways page 1. why ist this?
here is the relevant code snippet:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
from BeautifulSoup import BeautifulSoup
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:14.0) Gecko/20100101 Firefox/14.0.1'
f_open=MyOpener()
page=BeautifulSoup(f_open.open('http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr-als-zwei-Jahren?seite=5#forumstart'))
print page
Upvotes: 0
Views: 150
Reputation: 43467
according to OP. my comment to him solved the problem.
my comment:
maybe it is the "#" i heard it can cause errors sometimes, put a r at the start of your search string. like r'http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr-als-zwei-Jahren?seite=5#forumstart'
so it seems it was a simple mistake.
Upvotes: 1