Reputation: 127
I have this regex filter: <+>|\P{L}
Numbers and HTML tags are deleted.
My problem is that spaces are also deleted and I don't want spaces to be deleted.
For example, I need to change this text "(0) Ship Out" to this "Ship Out". Now it returns "ShipOut".
How can i fix it?
Upvotes: 2
Views: 61
Reputation: 67968
<+>|\P{L}|\P{Z}
You can use this filter for that.
You can also use
\p{L}|(?<=\p{L})\p{Z}(?=\p{L})
If you want to preserve space
between words only
Upvotes: 0
Reputation: 626754
You might be looking for a way to still match \P{L}
(any character that is not a Unicode letter) and still be able to not match a space.
Just use a reverse shorthand class \p{L}
in a negated character class [^\p{L}\s]
.
No idea if <+>
is working for you, you might be looking for <[^<]*>
.
So, my suggestion is
Regex.Replace(str, @"<[^<]*>|[^\p{L}\s]", string.Empty).Trim();
See demo
Trim()
will get rid of leading and trailing whitespace.
Upvotes: 3