Nyla Pareska
Nyla Pareska

Reputation: 1375

regular expression for textarea

I am looking for a regular expression to filter out all \r\n out of the html file but if there is a textarea it should be passed without having the enter removed.

I am using .NET (C#) technology.

Upvotes: 0

Views: 1388

Answers (5)

trampster
trampster

Reputation: 8898

Read this: RegEx match open tags except XHTML self-contained tags

This question is like saying how do you do up a bolt with a hammer. Now I'm sure if you were determined enough you could do tighten the bolt with a hammer. However it would be difficult and problematic to say the least and the chances are you would break something by trying.

Take a step back, throw away the assumption that your hammer is the best tool and go back to your tool box, if you dig around in there you will find a better tool its called an HTML parser.

Upvotes: 0

Steve Wortham
Steve Wortham

Reputation: 22220

This is extremely similar to this answer I've given before.

Fortunately, .NET has a balanced matching feature.

So you can do this:

(<textarea[^>]*>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</textarea>)|\r\n

Then you can perform a replace value of $1.

Here it is in action: http://regexhero.net/tester/?id=292c5529-5fe8-42e9-8d72-d7ea9ab9e1fe

Hope that helps. The benefit of using balanced matching like this is that it's powerful enough to handle nested tags that are inherent to HTML.

However, it's still not 100% reliable. Comments can still throw it off. And of course this is also an insanely complicated regular expression to manage if you ever need to make changes. So you may still want to use an html parser after all.

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 837996

Don't use regular expressions - use an HTML parser.

Upvotes: 3

3Dave
3Dave

Reputation: 29041

Speaking of HTML parsers, the Html Agility Pack is great for solving this type of problem.

Upvotes: 2

Dor
Dor

Reputation: 7484

Alternative approach:

  1. Find, with regex, the position (in the string) where there's a textarea element. The suitable regex for this would be: (<textarea>(.*?)</textarea>)
  2. Remove the \r\n characters from everywhere, except the places you found on #1.

Upvotes: 0

Related Questions