Ashraf Lobo
Ashraf Lobo

Reputation: 157

How to use sed to extract text from a webpage

Hey i'm using a combination of sed and curl to extract some text from the webpage example.com

here is my code

curl -s http://example.com | sed -n -e 's/.*<h1>\(.*\)<\/h1>.*<p>\(This.*\)<\/p>/\1 \n \2/p'

however, I don't get any output. What could I be doing wrong?

Upvotes: 0

Views: 423

Answers (1)

Donat
Donat

Reputation: 4813

Although sed is generally not the right tool for extracting text from web pages it may be sufficent for simple tasks. sed is a line oriented tool. So each line will be handled separately.

If you really want to do it with sed, this will you give some output:

curl -s http://example.com | sed -n -e 's/.*<h1>\(.*\)<\/h1>/\1 \n/p' -e 's/<p>\(This.*\)/\1 \n/p'

Upvotes: 1

Related Questions