Reputation: 2271
I have a script that is getting the GeoIP locations of various ips, this is run daily and I'm going to expect to have around ~50,000 ips to look up.
I have a GeoIP system set up - I just would like to eliminate having to run wget 50,000 times per report.
What I was thinking is, there must be some way to have wget open a connection with the url - then pass the ips, that way it doesn't have to re-establish the connection.
Any help will be much appreciated.
Upvotes: 1
Views: 1918
Reputation: 10001
You could also write a small program (in Java or C or whatever) that sends the list of files as a POST request and the server returns an object with data about them. Shouldn't be too slow either.
Upvotes: 0
Reputation: 6453
You could also write a threaded Ruby script to run wget on multiple input files simultaneously to speed the process up. So if you have 5 files containing 10,000 addresses each, you could use this script:
#!/usr/bin/ruby
threads = []
for file in ARGV
threads << Thread.new(file) do |filename|
system("wget -i #{filename}")
end
end
threads.each { |thrd| thrd.join }
Each of these threads would use one connection to download all addresses in a file. The following command then means only 5 connections to the server to download all 50,000 files.
./fetch.rb "list1.txt" "list2.txt" "list3.txt" "list4.txt" "list5.txt"
Upvotes: 0
Reputation: 204698
If you give wget
several addresses at once, with consecutive addresses belonging to the same HTTP/1.1 (Connection: keep-alive
) supporting server, wget
will re-use the already-established connection.
If there are too many addresses to list on the command line, you can write them to a file and use the -i
/--input-file=
option (and, per UNIX tradition, -i-
/--input-file=-
reads standard input).
There is, however, no way to preserve a connection across different wget
invocations.
Upvotes: 2