Wget page title

Question

Is it possible to Wget a page's title from the command line?

input:

$ wget http://bit.ly/rQyhG5 <>



output:

If it’s broke, fix it right   - Keeping it Real Estate. Home

jfg956 · Accepted Answer

This script would give you what you need:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 's!.*$.*$.*!\1!p'

But there are lots of situations where it breaks, including if there is a ... in the body of the page, or if the title is on more than one line.

This might be a little better:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | paste -s -d " "  \
  | sed -e 's!.*$.*$.*!\1!' \
  | sed -e 's!.*$.*$.*!\1!'

but it does not fit your case as your page contains the following head opening:

Again, this might be better:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | paste -s -d " "  \
  | sed -e 's!.*]*>$.*$.*!\1!' \
  | sed -e 's!.*$.*$.*!\1!'

but there is still ways to break it, including no head/title in the page.

Again, a better solution might be:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | paste -s -d " "  \
  | sed -n -e 's!.*]*>$.*$.*!\1!p' \
  | sed -n -e 's!.*$.*$.*!\1!p'

but I am sure we can find a way to break it. This is why a true xml parser is the right solution, but as your question is tagged shell, the above it the best I can come with.

The paste and the 2 sed can be merged in a single sed, but is less readable. However, this version has the advantage of working on multi-line titles:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 'H;${x;s!.*]*>$.*$.*!\1!;T;s!.*$.*$.*!\1!p}'

Update:

As explain in the comments, the last sed above uses the T command which is a GNU extension. If you do not have a compatible version, you can use:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 'H;${x;s!.*]*>$.*$.*!\1!;tnext;b;:next;s!.*$.*$.*!\1!p}'

Update 2:

As above still not working on Mac, try:

wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -e 'H;${x;s!.*]*>$.*$.*!\1!;tnext};b;:next;s!.*$.*$.*!\1!p'

and/or

cat << EOF > script
H
\$x
\$s!.*]*>$.*$.*!\1!
\$tnext
b
:next
s!.*$.*$.*!\1!p
EOF
wget --quiet -O - http://bit.ly/rQyhG5 \
  | sed -n -f script

(Note the \ before the $ to avoid variable expansion.)

It seams that the :next does not like to be prefixed by a $, which could be a problem in some sed version.

Wget page title

Answers (2)

Related Questions