Reputation: 1725
I had some time off recently and thought it would be a neat exercise to see how quickly I could put together a working program to automatically retrieve '.torrent' files for me. I'm aware there are existing solutions, but this was more of a programming exercise.
All was well, it ran, checked the sites for new torrents, and attempted to download them. But this is where I'm running into a problem; one of the sites that I'm trying to download the .torrent file from is giving me a file containing this instead of the torrent file when I try to download it;
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
<hr>
<address>Apache/2.2.3 (CentOS) Server at forums.mvgroup.org Port 80</address>
</body></html>
My first thought was maybe a broken link, so I went and successfully downloaded the file in my browser, so it's not a broken link.. My next thought is that maybe I'm not downloading the file correctly.. This is the example that I used, and this is actual code that's doing the downloading in my program.
I have a sneaking suspicion this is going to turn out to be one of those brain-dead simple gotchas, but I'm having a heck of a time figuring it out. Does anyone know why I'm getting a 400, or how to fix this?
Upvotes: 1
Views: 398
Reputation: 75406
You need a logging proxy in between, so you can see which bytes go over the wire.
If you use Eclipse, it has a http proxy available. I believe it is part of the Eclipse Java EE download.
Upvotes: 0
Reputation: 160601
A broken link should return a 404 Not Found
error. Because you can retrieve the file with a browser I see there are two other possible issues: Either you are missing handling redirects in your code that the browser handles automatically, or you are missing needed session IDs or cookies or some state value. Again, a browser will handle those but your code will not unless you write it in, or take advantage of the right gem.
The sample code you link to at http://snippets.dzone.com/posts/show/2469 is rudimentary, but is not wired to follow redirects, which is what I suspect you need. I glanced at your code and it doesn't handle them either. The "Following Redirection" sample code in the docs for Net::HTTP
shows how to do it.
Rather than write the code to retrieve the URL yourself, amounting to reinventing the wheel, I recommend using Ruby's Open::URI
, because it handles redirects automatically along with time-out retries. It's easy to use and a good work horse for those normal "get a URL" jobs.
If you want to have a gem that handles redirects and cookies and session IDs, look at Mechanize. It's a very good gem for general purpose tasks, though it is really designed for navigating web sites.
For more robust tasks, Curb and Typhoeus are good because they can handle multiple requests, though you'll need to write a bit more code for managing the files and navigating sites. For a file download they'd be fine.
Upvotes: 2