Zenet
Zenet

Reputation: 7411

How to extract a page title

I am trying to extract the page title from an HTML page

cat index.html | grep -i "title>"| sed 's/<title>/ /i'| sed 's/<\/title>/ /i'

The problem happens when some pages are written in one line! (believe me it happens)

How do I solve that?

Thanks!

Upvotes: 1

Views: 1269

Answers (2)

ghostdog74
ghostdog74

Reputation: 342373

this awk one liner works also for title that spans more than 1 line.

$ cat file
<html>
    <title>How to extract a page
title - Stack Overflow</title>
    <link rel="stylesheet" href="http://sstatic.net/so/all.css?v=4864b39b46cf">
    <link rel="shortcut icon" href="http://sstatic.net/so/favicon.ico">
    <link rel="apple-touch-icon" href="http://sstatic.net/so/apple-touch-icon.png">
</html>

$ awk 'BEGIN{RS="</title>"}/title/{gsub(".*<title>","");print}' file
How to extract a page
title - Stack Overflow

Upvotes: 0

mcandre
mcandre

Reputation: 24602

sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'

From Linux Commands.

1st result for Google: unix extract page title.

Upvotes: 1

Related Questions