azemi
azemi

Reputation: 381

Get file size after downloading URL content with wget

I'm trying to write a bash script that will download the contents of a URL (not recursive) and then analyze the file that was downloaded.

If the downloaded file is a text file (i.e index.html) I want to know the size of the file and count the number of characters within that file.

If the file is an image file I just want to know the file size.

Right now I'm working with wget and downloading the contents of the input URL, but the problem is that when I do this inside my script I don't know the file name of the file that was downloaded.

So, the two main question are:

  1. How can I get the filename in my script after using wget to perform some analyzing operations on the file?
  2. How can I deterime the file type of the downloaded file?

Upvotes: 1

Views: 565

Answers (2)

azemi
azemi

Reputation: 381

I did finally manage to solve it.

#!usr/bin/env bash
URL="$1"
FILENAME=$(date +%y-%m-%d-%T) #Set the current date and time as the filename
wget -O "$FILENAME" "$URL"    #Download the content from the URL and set the filename
FILE_INFO=$(file "$FILENAME") #Store the output from the 'file' command

if [[ "$FILE_INFO" == *"text"* ]]
then 
 echo "It's a text file"
elif [[ "$FILE_INFO" == *"image"* ]]
then 
 echo "It's an image"
fi

Special thanks to Ben Scott for the help!

Upvotes: 2

Ben Scott
Ben Scott

Reputation: 318

I would suggest setting the file name wget will write to, using the -O switch. One can then generate a file name, tell wget to download the URL to that file name, and run whatever analysis tools one wants, using the file name you picked.

The idea here is, you not have to figure out what name the web site or URL or wget will pick -- you are controlling the parameters. That is a useful programming technique in general. The less the user or some external program or website can provide for input, the more robust and simpler your program code will be.

As for picking a file name, you could use a timestamp. The date utility can generate a timestamp for you, if you give it a +FORMAT parameter. Alternatively, since you mention this is part of an analysis tool, maybe you don't want to save the file at all. In that case, try a tool like mktemp to generate a guaranteed unique file name, and then remove it before exiting.

For more information, see the manual pages wget(1), date(1), and mktemp(1).

Not giving complete working code, in case anyone ever gets this as school assignment, and they stumble across this question. I wouldn't want to make it too easy for that hypothetical person. ;-) Of course, if someone asked more specific questions, I'd likely clarify my answer for them.

Upvotes: 1

Related Questions