Reputation: 10171
I've been searching for a command line tool that would turn html code into just the text that would appear on the site... so it would be equivalent to in a web browser selecting everything and then pasting it into a text editor...
Anyone know of something in Ubuntu that would do this? I'm trying to write a script to parse some webpages, but would prefer not to have to deal with the HTML and would prefer to just parse the text that appears on the website.
Thanks,
Dan
Upvotes: 6
Views: 3742
Reputation: 4923
i think you need lynx:
lynx -dump http://stackoverflow.com > file
Upvotes: 3
Reputation: 83719
if you already have the html file:
lynx -dump file.html > file.txt
otherwise use @Ignacio's
Upvotes: 7