Krzysztof Malinowski
Krzysztof Malinowski

Reputation: 115

preg replace remove the same repeated non-word character

I want to remove the same repeated non-word character.

My code looks like this:

<?php
$name = 'Malines - Blockbuster (prod. Malines) ***** (((((( &%^$';
echo preg_replace('/[^\pL\pN\s]{2,}/u', '', $name);
?>

Malines - Blockbuster (prod. Malines) ***** (((((( &%^$ should be Malines - Blockbuster (prod. Malines) &%^$

Only repeated non-word character should be removed. Could you help me?

Upvotes: 0

Views: 101

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89567

Using the same idea than Arlaud Agbe Pierre (that is the way to do it), the pattern can be shorten using the Xan character class. \p{Xan} is the union of \p{L} and \p{N}. \P{Xan} (with an uppercase "p") is the negation of this class (i.e. all that is not a letter or a number).

$str = 'Malines - Blockbuster (prod. Malines) ààà ***** (((((( &%^$';

echo preg_replace('~(\P{Xan})\1+~u', '$1', $str);

Note: in this pattern, consecutive white characters are removed too.

an other way:

echo preg_replace('~(\P{Xan})\K\1+~u', '', $str);

where \K resets the begining of the match from the match result. (note that you can however define capturing groups before that you can use after)

Upvotes: 1

Pierre Arlaud
Pierre Arlaud

Reputation: 4133

You must use back-reference to say that the second character is the same as the one before :

/([^\pL\pN\s])\1+/u

Upvotes: 2

Related Questions