Reputation: 684
Basically i have a webpage with embedded css and JavaScript, so what i want to do is extract only the HTML itself, from texts to tables , images and what not.
So far i have the whole web page stored into a string called "html" the contents of this page is just the facebook hompepage for example,but as you will see there's all scripts and other embedded stuff which i don't want to have.
HTMLEdit = //webpage I chose to store in here//
string html = HTMLEdit.DocumentText;
String result = "this i want to only contain the <head>,<body>,<foot>."
I am only interested in displaying the result witch only contains html, i don't want the JavaScript or css or any other stuff
I have looked at the agility pack but there's no documentation on there website to do this and this is my first ever c# project i have decided to make, so excuse my ignorance if i don't make sense.
Upvotes: 0
Views: 175
Reputation: 27581
See this question HTML Agility Pack strip tags NOT IN whitelist
Maybe adapt that answer, and drop link and script tags.
Upvotes: 2