Reputation: 355
I need a regex script to remove double repetition for these particular words..If these character occurs replace it with single.
/[\s.'-,{2,0}]
These are character that if they comes I need to replace it with single same character.
Upvotes: 8
Views: 24246
Reputation: 30595
The PCRE-compatible regex to match this would be:
/([\s.',-])\1+/
If you're using Perl, you can replace it using the following expression:
s/([\s.',-])\1+/$1/g
If you're using PHP, then you would use this syntax:
$out = preg_replace('/([\s.\',-])\1+/', '$1', $in);
()
group matches the single character, in this case either a whitespace character (\s
) or the punctuation characters (. ' - ,
). It's good practice to put -
at the end of the list inside []
.\1
means that the same thing it just matched in the parentheses occurs at least once more.$1
refers to the match in first set of parentheses.Note: this is Perl-Compatible Regular Expression (PCRE) syntax.
From the perlretut man page:
Matching repetitions
The examples in the previous section display an annoying weakness. We were only matching 3-letter words, or chunks of words of 4 letters or less. We'd like to be able to match words or, more generally, strings of any length, without writing out tedious alternatives like
\w\w\w\w|\w\w\w|\w\w|\w
.This is exactly the problem the quantifier metacharacters
?
,*
,+
, and{}
were created for. They allow us to delimit the number of repeats for a portion of a regexp we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:
a?
means: match 'a' 1 or 0 times
a*
means: match 'a' 0 or more times, i.e., any number of times
a+
means: match 'a' 1 or more times, i.e., at least once
a{n,m}
means: match at least "n" times, but not more than "m" times.
a{n,}
means: match at least "n" or more times
a{n}
means: match exactly "n" times
Upvotes: 18
Reputation: 13564
If I understand correctly, you want to do the following: given a set of characters, replace any multiple occurrence of each of them with a single character. Here's how I would do it in perl:
perl -pi.bak -e "s/\.{2,}/\./g; s/\-{2,}/\-/g; s/'{2,}/'/g" text.txt
If, for example, text.txt originally contains:
Here is . and here are 2 .. that should become a single one. Here's also a double -- that should become a single one. Finally here we have three ''' which should be substituted with one '.
it is modified as follows:
Here is . and here are 2 . that should become a single one. Here's also a double - that should become a single one. Finally here we have three ' which should be substituted with one '.
I simply use the same replacement regex for each character in in the set: for example
s/\.{2,}/\./g;
replaces 2 or more occurrences of a dot character with a single dot. I concatenate several of this expressions, one for each character of your original set.
There may be more compact ways of doing this, but, I think this is simple and it works :)
I hope it helps.
Upvotes: 0
Reputation: 141839
Using Javascript as mentioned in a commennt, and assuming (It's not too clear from your question) the characters you want to replace are space characters, .
, '
, -
, and ,
:
var str = 'a b....,,';
str = str.replace(/(\s){2}|(\.){2}|('){2}|(-){2}|(,){2}/g, '$1$2$3$4$5');
// Now str === 'a b..,'
Upvotes: 1
Reputation: 4625
As others said it depends on you regex engine but a small example how you could do this:
/([ _-,.])\1*/\1/g
With sed:
$ echo "foo , bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo , bar
$ echo "foo,. bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo,. bar
Upvotes: 1