Aditii
Aditii

Reputation: 355

Replace repeating characters with one with a regex

I need a regex script to remove double repetition for these particular words..If these character occurs replace it with single.

/[\s.'-,{2,0}]

These are character that if they comes I need to replace it with single same character.

Upvotes: 8

Views: 24246

Answers (4)

amphetamachine
amphetamachine

Reputation: 30595

The PCRE-compatible regex to match this would be:

/([\s.',-])\1+/

If you're using Perl, you can replace it using the following expression:

s/([\s.',-])\1+/$1/g

If you're using PHP, then you would use this syntax:

$out = preg_replace('/([\s.\',-])\1+/', '$1', $in);

Explanation

  • The () group matches the single character, in this case either a whitespace character (\s) or the punctuation characters (. ' - ,). It's good practice to put - at the end of the list inside [].
  • The \1 means that the same thing it just matched in the parentheses occurs at least once more.
  • In the replacement, the $1 refers to the match in first set of parentheses.

Note: this is Perl-Compatible Regular Expression (PCRE) syntax.

From the perlretut man page:

Matching repetitions

The examples in the previous section display an annoying weakness. We were only matching 3-letter words, or chunks of words of 4 letters or less. We'd like to be able to match words or, more generally, strings of any length, without writing out tedious alternatives like \w\w\w\w|\w\w\w|\w\w|\w.

This is exactly the problem the quantifier metacharacters ?, *, +, and {} were created for. They allow us to delimit the number of repeats for a portion of a regexp we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:

  • a? means: match 'a' 1 or 0 times

  • a* means: match 'a' 0 or more times, i.e., any number of times

  • a+ means: match 'a' 1 or more times, i.e., at least once

  • a{n,m} means: match at least "n" times, but not more than "m" times.

  • a{n,} means: match at least "n" or more times

  • a{n} means: match exactly "n" times

Upvotes: 18

MarcoS
MarcoS

Reputation: 13564

If I understand correctly, you want to do the following: given a set of characters, replace any multiple occurrence of each of them with a single character. Here's how I would do it in perl:

perl -pi.bak -e "s/\.{2,}/\./g; s/\-{2,}/\-/g; s/'{2,}/'/g" text.txt

If, for example, text.txt originally contains:

Here is . and here are 2 .. that should become a single one. Here's also a double -- that should become a single one. Finally here we have three ''' which should be substituted with one '.

it is modified as follows:

Here is . and here are 2 . that should become a single one. Here's also a double - that should become a single one. Finally here we have three ' which should be substituted with one '.

I simply use the same replacement regex for each character in in the set: for example

s/\.{2,}/\./g;

replaces 2 or more occurrences of a dot character with a single dot. I concatenate several of this expressions, one for each character of your original set.

There may be more compact ways of doing this, but, I think this is simple and it works :)

I hope it helps.

Upvotes: 0

Paul
Paul

Reputation: 141839

Using Javascript as mentioned in a commennt, and assuming (It's not too clear from your question) the characters you want to replace are space characters, ., ', -, and ,:

var str = 'a  b....,,';
str = str.replace(/(\s){2}|(\.){2}|('){2}|(-){2}|(,){2}/g, '$1$2$3$4$5');
// Now str === 'a b..,'

Upvotes: 1

Ulrich Dangel
Ulrich Dangel

Reputation: 4625

As others said it depends on you regex engine but a small example how you could do this: /([ _-,.])\1*/\1/g

With sed:

$ echo "foo    , bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo , bar
$ echo "foo,. bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo,. bar

Upvotes: 1

Related Questions