Mark Harrison
Mark Harrison

Reputation: 304434

How to curl or wget a web page?

I would like to make a nightly cron job that fetches my stackoverflow page and diffs it from the previous day's page, so I can see a change summary of my questions, answers, ranking, etc.

Unfortunately, I couldn't get the right set of cookies, etc, to make this work. Any ideas?

Also, when the beta is finished, will my status page be accessible without logging in?

Upvotes: 19

Views: 9110

Answers (5)

Grant
Grant

Reputation: 12039

Your status page is available now without logging in (click logout and try it). When the beta-cookie is disabled, there will be nothing between you and your status page.

For wget:

wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" https://stackoverflow.com/users/30/myProfile.html

Upvotes: 9

Grant
Grant

Reputation: 12039

From Mark Harrison

And here's what works...

curl -s --cookie soba=. https://stackoverflow.com/users

And for wget:

wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" https://stackoverflow.com/users/30/myProfile.html

Upvotes: 6

Mark Harrison
Mark Harrison

Reputation: 304434

And here's what works...

curl -s --cookie soba=. http://stackoverflow.com/users

Upvotes: 2

Ryan Ahearn
Ryan Ahearn

Reputation: 7934

I couldn't figure out how to get the cookies to work either, but I was able to get to my status page in my browser while I was logged out, so I assume this will work once stackoverflow goes public.

This is an interesting idea, but won't you also pick up diffs of the underlying html code? Do you have a strategy to avoid ending up with a diff of the html and not the actual content?

Upvotes: 2

sparkes
sparkes

Reputation: 19503

Nice idea :)

I presume you've used wget's

--load-cookies (filename)

might help a little but it might be easier to use something like Mechanize (in Perl or python) to mimic a browser more fully to get a good spider.

Upvotes: 3

Related Questions