Reputation: 14998
I'm working on a C# application that needs to scrape some data from a phpBB forum. The forum scraping requires logging in. The application will prompt the user for their login credentials to connect.
I've scraped websites before with C#, but what I'm not sure how to do is login to phpBB and keep a session open during the duration of the screen scraping. I've done some searching and haven't had much luck. Is there a good way to programmatically do something like this?
Upvotes: 0
Views: 900
Reputation: 4081
I would recommend using WatiN API for doing screen scraping. I have done screen scraping using this API and it does good work. Check it out !
Upvotes: 0
Reputation: 8595
You don't say what you've tried, but if you used an HttpWebRequest object to retrieve pages and/or logon, then you need to assign a new CookieContainer collection to the HttpWebRequest to store any cookies returned by the website. Share this amongst HttpWebRequest objects to remain logged in
Upvotes: 1
Reputation: 38442
look for the names of the username and password fields using Firebug or Chrome (or even View Source), then use my answer here: Programmatically logging into a site, replacing 'session_key' and 'session_password' as appropriate. that should work.
and then translate to C#!
Upvotes: 0