Nomi
Nomi

Reputation: 127

C# regex filters

I have this regex filter: <+>|\P{L}

Numbers and HTML tags are deleted.

My problem is that spaces are also deleted and I don't want spaces to be deleted.

For example, I need to change this text "(0) Ship Out" to this "Ship Out". Now it returns "ShipOut".

How can i fix it?

Upvotes: 2

Views: 61

Answers (2)

vks
vks

Reputation: 67968

 <+>|\P{L}|\P{Z}

You can use this filter for that.

See demo.

You can also use

\p{L}|(?<=\p{L})\p{Z}(?=\p{L})

If you want to preserve space between words only

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626754

You might be looking for a way to still match \P{L} (any character that is not a Unicode letter) and still be able to not match a space.

Just use a reverse shorthand class \p{L} in a negated character class [^\p{L}\s].

No idea if <+> is working for you, you might be looking for <[^<]*>.

So, my suggestion is

Regex.Replace(str, @"<[^<]*>|[^\p{L}\s]", string.Empty).Trim();

See demo

enter image description here

Trim() will get rid of leading and trailing whitespace.

Upvotes: 3

Related Questions