Chris
Chris

Reputation: 107

Using sed to read byte count of a website from wget

I'm trying to only print a small portion of the output of a wget command. If I type

wget http://google.com --spider --server-response

I receive a long list of output to the terminal that I want to search. One of those lines is

Content-Length: 219

All I want to do is read and print out the number 219 to stdout. I found an answer on another stack overflow thread (get file size of a file to wget before wget-ing it?)

wget http://google.com --spider --server-response -O - 2>&1 | sed -ne '/Content-Length/{s/.*: //;p}'

I'm having two main difficulties understanding this command. I was hoping someone could explain to me in detail about these two things.

  1. sed usually requires an input file right? Piping the output from the wget command doesn't make it a file. How come it works without this?

  2. I don't understand what -e means. I've looked up the linux man pages and it mentions it is for "script" ? This flag is important because without it, nothing works. What does it mean? Also, what is happening with the rest of the command and how does that print out just the number?

Sorry to ask a previously answered question but I haven't found any explanation on line that makes sense, and I want to try doing this with an alternate solution!

Upvotes: 0

Views: 774

Answers (2)

Barmar
Barmar

Reputation: 780724

sed usually requires an input file right? Piping the output from the wget command doesn't make it a file. How come it works without this?

Like most Unix utilities, sed will process files if they're given as arguments, otherwise it will process its standard input.

I don't understand what -e means. I've looked up the linux man pages and it mentions it is for "script" ? What does that means? Also, what is happening in the line with the quotes?

-e is used to indicate that the next argument is a string of sed operations to execute (the documentation calls this a "script"). This is the default for the first argument to sed, but the script you got happens to used it explicitly. It's mostly useful when you're giving multiple commands, because if you didn't use -e before the additional commands they would be treated as filenames. See also

what does dash e(-e) mean in sed commands?

In your command, the -n option means that sed should not print its input lines by default -- you'll use the p operation to print selected lines explicitly. /Content-Length/ matches lines that contain that string, and this is followed by a set of operations to perform on those matching lines in {}. The first operation is s/.*: //, which replaces everything up to the : and the space after it with nothing. The second operation is p, which prints the modified line. So that prints the number after Content-Length:.

Upvotes: 5

Arjun Mathew Dan
Arjun Mathew Dan

Reputation: 5298

You can still reduce that sed command (wget -O not required, sed -e not required) to:

wget http://google.com --spider --server-response 2>&1 | sed -n '/Content-Length/{s/.*: //;p}'

Here, redirect STDERR to STDOUT and make sed to operate on that. What the sed command does is, it suppresses printing(-n), then for lines containing Content-Length, remove all characters from beginning, including the : and space. Then print the modified line (p in sed).

Same with awk:

wget http://google.com --spider --server-response 2>&1 | awk '/Content-Length/{print $2}'

For lines containing Content-Length, print the second field (which will be the number part).

Upvotes: 1

Related Questions