supersambo
supersambo

Reputation: 811

Python urllib different outcome in firefox

I´m trying to programm a webcrawler for the message board of an austrian newspaper called derstandard.at. I´m interested in the interactions an would like to do a network analysis of the users. I was able to retrieve everything I wanted but when in comes to change the message boards page it simply doesn't work.

Using firefox I can simply access the pages I want by changing one number in the url for example page 5

http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr-als-zwei-Jahren?seite=5#forumstart

when I try to access this out of my python script I alway get page 1.

First I thought this was because of my user agent but I changed it to my firefox user agent and still get allways page 1. why ist this?

here is the relevant code snippet:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
from BeautifulSoup import BeautifulSoup

from urllib import FancyURLopener
class MyOpener(FancyURLopener):
    version = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:14.0) Gecko/20100101 Firefox/14.0.1'

f_open=MyOpener()

page=BeautifulSoup(f_open.open('http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr-als-zwei-Jahren?seite=5#forumstart'))

print page

Upvotes: 0

Views: 150

Answers (1)

Inbar Rose
Inbar Rose

Reputation: 43467

according to OP. my comment to him solved the problem.

my comment:

maybe it is the "#" i heard it can cause errors sometimes, put a r at the start of your search string. like r'http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr‌​-als-zwei-Jahren?seite=5#forumstart'

so it seems it was a simple mistake.

Upvotes: 1

Related Questions