Reputation: 4675
I want to remove dashes
before, after, and between spaced words, but not hyphenated words.
This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.
should become:
This is a test-sentence. Test One-Two--Three---Four.
Remove multiple dashes ---
.
Keep multiple hyphens Three---Four
.
I was trying to do it with this:
string sentence = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
string regex = @"(?<!\w)\-(?!\-)|(?<!\-)\-(?!\w)";
sentence = Regex.Replace(sentence, regex, "");
Console.WriteLine(sentence);
But the output is:
This is a test-sentence. Test - One-TwoThree-Four--.
Upvotes: 2
Views: 1373
Reputation: 626689
You may just match all hyphens in between word chars, and remove all others with a simple
Regex.Replace(s, @"\b(-+)\b|-", "$1")
See the regex demo
Details
\b(-+)\b
- word boundary, followed with 1+ hyphens, and then again a word boundary (that is, hyphen(s) in between letters, digits and underscores)|
- or-
- a hyphen in other contexts (it will be removed).See the C# demo:
var s = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
var result = Regex.Replace(s, @"\b(-+)\b|-", "$1");
Console.WriteLine(result);
// => This is a test-sentence. Test One-Two--Three---Four.
Upvotes: 1
Reputation: 12937
You can use \b|\s
for this task.
Breakdown shamelessly copied from regex101.com:
/(\b|\s)(-{3})(\b|\s)/g
(\b|\s)
\b
\b
assert position at a word boundary (^\w|\w$|\W\w|\w\W)
\s
matches any whitespace character (equal to [\r\n\t\f\v ]
)(-{3})
-{3}
matches the character - literally (case sensitive){3}
Quantifier — Matches exactly 3 times(\b|\s)
\b
\b
assert position at a word boundary (^\w|\w$|\W\w|\w\W)
\s
\s
matches any whitespace character (equal to [\r\n\t\f\v ]
)Upvotes: 2
Reputation: 42304
What I would recommend doing is a combination of both a positive lookback and a positive lookahead against the characters that you don't want the dashes to be next to. In your case, that would be spaces and full stops. If either the lookbehind or lookahead match, you want to remove that dash.
This would be: ((?<=[\s\.])\-+)|(\-+(?=[\s\.]))
.
Breaking this down:
((?<=[\s\.])\-+)
- match hyphens that follow either a space or a full stop|
- or(\-+(?=[\s\.])
- match hyphens that are followed by either a space or a full stopHere's a JavaScript example showcasing that:
const string = 'This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.';
const regex = /((?<=[\s\.])\-+)|(\-+(?=[\s\.]))/g;
console.log(string.replace(regex, ''));
And this can also been seen on Regex101.
Note that you'll probably also want to trim the excess spaces after using this, which can simply be done with .Trim()
in C#.
Upvotes: 2