Reputation: 2983
All,
I would like to get a list of files off of a server with the full url in tact. For example, I would like to get all the TIFFs from here.
http://hyperquad.telascience.org/naipsource/Texas/20100801/*
I can download all the .tif files with wget but I am looking for is just the full url to each file like this.
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_3_20100424.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_1_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
Any thoughts on how to get all these files in to a list using something like curl or wget?
Adam
Upvotes: 6
Views: 70040
Reputation: 39
I have a client-server system that retrieves the file names from an assigned folder in the app server's folder, then displays thumbnails in the client. CLIENT: (slThumbnailNames is a string list) == on the server side === A TIDCmdTCPServer has a CommandHandler GetThumbnailNames (a commandhandler is a procedure)
Hints: sMFFBServerPictures is generated in the oncreate method of the app server. sThumbnailDir is passed to the app server from the client.
`slThumbnailNames := funGetThumbnailNames(sThumbNailPath);
function TfMFFBClient.funGetThumbnailNames(sThumbnailPath:string):TStringList;
var
slThisStringList:TStringList;
begin
slThisStringList := TStringList.Create;
dmMFFBClient.tcpMFFBClient.SendCmd('GetThumbnailNames,' + sThumbnailPath,700);
dmMFFBClient.tcpMFFBClient.IOHandler.Capture(slThisStringList);
result := slThisStringList;
end;
procedure TfMFFBServer.MFFBCmdTCPServercmdGetThumbnailNames(
ASender: TIdCommand);
var
sRec:TSearchRec;
sThumbnailDir:string;
i,iNumFiles: Integer;
begin
try
ASender.Response.Clear;
sThumbnailDir := ASender.Params[0];
iNumFiles := FindFirst(sMFFBServerPictures + sThumbnailDir + '*_t.jpg', faAnyfile, SRec );
if iNumFiles = 0 then
try
ASender.Response.Add(SRec.Name);
while iNumFiles = 0 do
begin
if (SRec.Attr and faDirectory <> faDirectory) then
ASender.Response.Add(SRec.Name);
iNumFiles := FindNext(SRec);
end;
finally
FindClose(SRec)
end
else
ASender.Response.Add('NO THUMBNAILS');
except
on e:exception do
begin
messagedlg('Error in procedure TfMFFBServer.MFFBCmdTCPServercmdGetThumbnailNames'+#13+
'Error msg: ' + e.Message,mterror,[mbok],0);
end;
end;
end;`
Upvotes: 0
Reputation: 11
With winscp have a find window that is possible search for all files in directories and subdirectories from a directory in the own web - after is possible select all and copy, and have in text all links to all files -, need have the username and password for connect ftp:
https://winscp.net/eng/download.php
Upvotes: 1
Reputation: 95459
If you wget http://hyperquad.telascience.org/naipsource/Texas/20100801/
, the HTML that is returned contains the list of files. If you don't need this to be general, you could use regexes to extract the links. If you need something more robust, you can use an HTML parser (e.g. BeautifulSoup), and programmatically extract the links on the page (from the actual HTML structure).
Upvotes: 1
Reputation: 6930
I would use lynx
shell web browser to get the list of links + grep
and awk
shell tools to filter the results, like this:
lynx -dump -listonly <URL> | grep http | grep <regexp> | awk '{print $2}'
..where:
http://hyperquad.telascience.org/naipsource/Texas/20100801/
\.tif$
Complete example commandline to get links to TIF files on this SO page:
lynx -dump -listonly http://stackoverflow.com/questions/6989681/getting-a-list-of-files-on-a-web-server | grep http | grep \.tif$ | awk '{print $2}'
..now returns:
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
Upvotes: 4
Reputation: 717
You'd need the server to be willing to give you a page with a listing on it. This would normally be an index.html or just ask for the directory.
http://hyperquad.telascience.org/naipsource/Texas/20100801/
It looks like you're in luck in this case so, at risk of upsetting the web master, the solution would be to use wget's recursive option. Specify a maximum recursion of 1 to keep it constrained to that single directory.
Upvotes: 5