Matt McManis
Matt McManis

Reputation: 4675

Remove Dashes but Not Hyphens

I want to remove dashes before, after, and between spaced words, but not hyphenated words.

This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.

should become:

This is a test-sentence. Test One-Two--Three---Four.

Remove multiple dashes ---.
Keep multiple hyphens Three---Four.


I was trying to do it with this:

http://rextester.com/SXQ57185

string sentence = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";

string regex = @"(?<!\w)\-(?!\-)|(?<!\-)\-(?!\w)";
sentence = Regex.Replace(sentence, regex, "");

Console.WriteLine(sentence);

But the output is:

This is a test-sentence. Test - One-TwoThree-Four--.

Upvotes: 2

Views: 1373

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You may just match all hyphens in between word chars, and remove all others with a simple

Regex.Replace(s, @"\b(-+)\b|-", "$1")

See the regex demo

Details

  • \b(-+)\b - word boundary, followed with 1+ hyphens, and then again a word boundary (that is, hyphen(s) in between letters, digits and underscores)
  • | - or
  • - - a hyphen in other contexts (it will be removed).

See the C# demo:

var s = "This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.";
var result = Regex.Replace(s, @"\b(-+)\b|-", "$1");
Console.WriteLine(result); 
// => This is  a test-sentence. Test  One-Two--Three---Four.

Upvotes: 1

Aniket Sahrawat
Aniket Sahrawat

Reputation: 12937

You can use \b|\s for this task.

/(\b|\s)(-{3})(\b|\s)/g

DEMO

Breakdown shamelessly copied from regex101.com:

/(\b|\s)(-{3})(\b|\s)/g

  • 1st Capturing Group (\b|\s)
    • 1st Alternative \b
      • \b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
    • 2nd Alternative \s
      • \s matches any whitespace character (equal to [\r\n\t\f\v ])
  • 2nd Capturing Group (-{3})
    • -{3} matches the character - literally (case sensitive)
    • {3} Quantifier — Matches exactly 3 times
  • 3rd Capturing Group (\b|\s)
    • 1st Alternative \b
      • \b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
    • 2nd Alternative \s
      • \s matches any whitespace character (equal to [\r\n\t\f\v ])

Upvotes: 2

Obsidian Age
Obsidian Age

Reputation: 42304

What I would recommend doing is a combination of both a positive lookback and a positive lookahead against the characters that you don't want the dashes to be next to. In your case, that would be spaces and full stops. If either the lookbehind or lookahead match, you want to remove that dash.

This would be: ((?<=[\s\.])\-+)|(\-+(?=[\s\.])).

Breaking this down:

  • ((?<=[\s\.])\-+) - match hyphens that follow either a space or a full stop
  • | - or
  • (\-+(?=[\s\.]) - match hyphens that are followed by either a space or a full stop

Here's a JavaScript example showcasing that:

const string = 'This- -is - a test-sentence. -Test- --- One-Two--Three---Four----.';
const regex = /((?<=[\s\.])\-+)|(\-+(?=[\s\.]))/g;
console.log(string.replace(regex, ''));

And this can also been seen on Regex101.

Note that you'll probably also want to trim the excess spaces after using this, which can simply be done with .Trim() in C#.

Upvotes: 2

Related Questions