Strip all HTML tags except certain ones?

Question

I have a requirement where I need to strip all tags out of a large block of HTML that is tag-soup, essentially stuff like:

etc.

I need to strip them all out, except the

tags, but on those I need to strip out the attributes such as style="" and just leave them as

.

I am currently stripping all tags with a regex:

public static string StripHtml(string input) => Regex.Replace(input, "<.*?>", string.Empty)

Any ideas on how to do this?

I would use a customized C# library for this but I am using .Net Core on Linux so a lot of these libraries (such as AngleSharp) that require the full framework aren't going to work for me.

Tracer69 · Accepted Answer

<((?!p\s).)*?> will give you all tags except the paragraphs. So your program could delete all matches of this regex and replace the rest of the tags (all p's) with empty paragraph tags. (

regex for receiving all p-tags)

Strip all HTML tags except certain ones?

Answers (1)

Related Questions