Dan
Dan

Reputation: 10171

Is there a simple way in linux to strip a website of text from command line?

I've been searching for a command line tool that would turn html code into just the text that would appear on the site... so it would be equivalent to in a web browser selecting everything and then pasting it into a text editor...

Anyone know of something in Ubuntu that would do this? I'm trying to write a script to parse some webpages, but would prefer not to have to deal with the HTML and would prefer to just parse the text that appears on the website.

Thanks,

Dan

Upvotes: 6

Views: 3742

Answers (3)

shuvalov
shuvalov

Reputation: 4923

i think you need lynx:

lynx -dump http://stackoverflow.com > file

Upvotes: 3

John Boker
John Boker

Reputation: 83719

if you already have the html file:

lynx -dump file.html > file.txt

otherwise use @Ignacio's

Upvotes: 7

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799140

lynx -dump http://example.com/

Upvotes: 12

Related Questions