Reputation: 2201
I am novice in python (c++ developer), I am trying to do some hands-on on web scraping on windows IE.
The problem which I am facing is, when I open a URL using "requests" library the server sends me a login page always. I figured out the problem. Its actually doing it because it presumes you are coming through IE tries to execute on function which uses some information from the SSO ( single signup object ) which is there executing on the background in Windows on the first login to the web server ( consider this as some weird setup.)
On observing this I changed my strategy & started using webbrowser lib. Now, when I try to do a webbrowser.open("url"), the browser is opening the page properly which is great thing!!!
But, my problems now are :
1) I do not want that the browser page opened should be visible to the user ( some way that the browser is opened in background ). I tried to used this :
ie = webbrowser.BackgroundBrowser(webbrowser.iexplore)
ie.Visible = 0
ie.open('url')
but no success. It opens the page which is visible to the user.
2) [This is main activity] I want to scrape the page which is opened in the web browser's IE page opened above. how to do? I tried to dig into this link but did not find any APIs for getting the data.
Kindly help.
PS : I tried to use beautiful soup for scraping on some other web pages using requests. It was successful & I go the data I wanted. But not in this case.
Upvotes: 0
Views: 961
Reputation: 71
The webbrowser
module doesn't allow to do that. The get
function you mentioned is to retrieve registered web browsers not to scrap a HTTP GET request.
I don't know what is triggering the behavior you described with IE, have you tried to change your User-Agent
with IE ones? You can check this post for more details: Sending "User-agent" using Requests library in Python
Upvotes: 1