FooBar
FooBar

Reputation: 11

Remove links from text file

how can I remove links from a raw html text? I've got:

Foo bar <a href="http://www.foo.com">blah</a> bar foo 

and want to get:

Foo bar blah bar foo

afterwards.

Upvotes: 1

Views: 911

Answers (4)

ghostdog74
ghostdog74

Reputation: 342303

$ echo 'Foo bar <a href="http://www.foo.com">blah</a> bar foo' | awk 'BEGIN{RS="</a>"}/<a href/{gsub(/<a href=\042.*\042>/,"")}1'

Foo bar blah bar foo

Upvotes: 0

danlei
danlei

Reputation: 14291

sed -re 's|<a [^>]*>([^<]*)</a>|\1|g'

But Brian's answer is right: This should only be used in very simple cases.

Upvotes: 2

patrick
patrick

Reputation: 6840

try with:

sed -e 's/<a[^>]*>.*<\/a>//g' test.txt

Upvotes: 0

Brian Agnew
Brian Agnew

Reputation: 272237

You're looking to parse HTML with regexps, and this won't work in all but the simplest cases, since HTML isn't regular. A much more reliable solution is to use an HTML parser. Numerous exist, for many different languages.

Upvotes: 2

Related Questions