AnotherDeveloper
AnotherDeveloper

Reputation: 2201

Open url in IE Then Scrape the same using python

I am novice in python (c++ developer), I am trying to do some hands-on on web scraping on windows IE.

The problem which I am facing is, when I open a URL using "requests" library the server sends me a login page always. I figured out the problem. Its actually doing it because it presumes you are coming through IE tries to execute on function which uses some information from the SSO ( single signup object ) which is there executing on the background in Windows on the first login to the web server ( consider this as some weird setup.)

On observing this I changed my strategy & started using webbrowser lib. Now, when I try to do a webbrowser.open("url"), the browser is opening the page properly which is great thing!!!

But, my problems now are :

1) I do not want that the browser page opened should be visible to the user ( some way that the browser is opened in background ). I tried to used this :

ie = webbrowser.BackgroundBrowser(webbrowser.iexplore)
ie.Visible = 0
ie.open('url')

but no success. It opens the page which is visible to the user.

2) [This is main activity] I want to scrape the page which is opened in the web browser's IE page opened above. how to do? I tried to dig into this link but did not find any APIs for getting the data.

Kindly help.

PS : I tried to use beautiful soup for scraping on some other web pages using requests. It was successful & I go the data I wanted. But not in this case.

Upvotes: 0

Views: 961

Answers (1)

payet_s
payet_s

Reputation: 71

The webbrowser module doesn't allow to do that. The get function you mentioned is to retrieve registered web browsers not to scrap a HTTP GET request.

I don't know what is triggering the behavior you described with IE, have you tried to change your User-Agent with IE ones? You can check this post for more details: Sending "User-agent" using Requests library in Python

Upvotes: 1

Related Questions