Reputation: 81573
I'm looking for a neat regex solution to replace
With a single space
For those playing at home (the following does work)
text.replace(/[^a-z0-9]/gmi, " ").replace(/\s+/g, " ");
My thinking is regex is probably powerful enough to achieve this in one statement. The components I think I'd need are
[^a-z0-9]
- to remove non alphanumeric characters\s+
- match any collections of spaces\r?\n|\r
- match all new line/gmi
- global, multi-line, case insensitiveHowever, I can't seem to style the regex in the right way (the following doesn't work)
text.replace(/[^a-z0-9]|\s+|\r?\n|\r/gmi, " ");
Input
234&^%,Me,2 2013 1080p x264 5 1 BluRay
S01(*&asd 05
S1E5
1x05
1x5
Desired Output
234 Me 2 2013 1080p x264 5 1 BluRay S01 asd 05 S1E5 1x05 1x5
Upvotes: 206
Views: 233162
Reputation: 2607
const processStirng = (str) => (
str
.replace(/[^a-z0-9\s]/gi, '') // remove all but alpha-numeric and spaces
.replace(/ +/g, ' '); // remove duplicated spaces
);
processSting(' $ your string here #');
Upvotes: 0
Reputation: 18641
When Unicode comes to play use
text.replace(/[^\p{L}\p{N}]+/gu," ");
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
[^\p{L}\p{N}]+ Any character except Unicode letters and digits
(1 or more times (matching the most amount possible))
JavaScript code snippet:
const text = `234&^%,Me,2 2013 1080p x264 5 1 BluRąy
S01(*&aśd 05
S1E5
1x05
1x5`
console.log(text.replace(/[^\p{L}\p{N}]+/gu, ` `))
Upvotes: 7
Reputation: 81573
Update
Please be aware, the browser landscape changes rapidly, these benchmarks would be woefully out of date, and likely misleading at the time you reading this.
This is an old post of mine, the other answers are good for the most part. However I decided to benchmark each solution and another obvious one (just for fun). I wondered if there was a difference between the regex patterns on different browsers with different sized strings.
So basically I used jsPerf on
The regex patterns I tested were
/[\W_]+/g
/[^a-z0-9]+/gi
/[^a-zA-Z0-9]+/g
I loaded them up with a string length of random characters
Example javascript I used var newstr = str.replace(/[\W_]+/g," ");
Each run consisted of 50 or more sample on each regex, and i run them 5 times on each browser.
Lets race our horses!
Results
Chrome Edge
Chars Pattern Ops/Sec Deviation Op/Sec Deviation
------------------------------------------------------------------------
5,000 /[\W_]+/g 19,977.80 1.09 10,820.40 1.32
5,000 /[^a-z0-9]+/gi 19,901.60 1.49 10,902.00 1.20
5,000 /[^a-zA-Z0-9]+/g 19,559.40 1.96 10,916.80 1.13
------------------------------------------------------------------------
1,000 /[\W_]+/g 96,239.00 1.65 52,358.80 1.41
1,000 /[^a-z0-9]+/gi 97,584.40 1.18 52,105.00 1.60
1,000 /[^a-zA-Z0-9]+/g 96,965.80 1.10 51,864.60 1.76
------------------------------------------------------------------------
200 /[\W_]+/g 480,318.60 1.70 261,030.40 1.80
200 /[^a-z0-9]+/gi 476,177.80 2.01 261,751.60 1.96
200 /[^a-zA-Z0-9]+/g 486,423.00 0.80 258,774.20 2.15
Truth be known, Regex in both browsers (taking into consideration deviation) were nearly indistinguishable, however i think if it run this even more times the results would become a little more clearer (but not by much).
Theoretical scaling for 1 character
Chrome Edge
Chars Pattern Ops/Sec Scaled Op/Sec Scaled
------------------------------------------------------------------------
5,000 /[\W_]+/g 19,977.80 99,889,000 10,820.40 54,102,000
5,000 /[^a-z0-9]+/gi 19,901.60 99,508,000 10,902.00 54,510,000
5,000 /[^a-zA-Z0-9]+/g 19,559.40 97,797,000 10,916.80 54,584,000
------------------------------------------------------------------------
1,000 /[\W_]+/g 96,239.00 96,239,000 52,358.80 52,358,800
1,000 /[^a-z0-9]+/gi 97,584.40 97,584,400 52,105.00 52,105,000
1,000 /[^a-zA-Z0-9]+/g 96,965.80 96,965,800 51,864.60 51,864,600
------------------------------------------------------------------------
200 /[\W_]+/g 480,318.60 96,063,720 261,030.40 52,206,080
200 /[^a-z0-9]+/gi 476,177.80 95,235,560 261,751.60 52,350,320
200 /[^a-zA-Z0-9]+/g 486,423.00 97,284,600 258,774.20 51,754,840
I wouldn't take to much into these results as this is not really a significant differences, all we can really tell is edge is slower :o . Additionally that i was super bored.
Anyway you can run the benchmark for your self.
Upvotes: 6
Reputation: 109
For anyone still strugging (like me...) after the above more expert replies, this works in Visual Studio 2019:
outputString = Regex.Replace(inputString, @"\W", "_");
Remember to add
using System.Text.RegularExpressions;
Upvotes: 1
Reputation: 1945
To replace with dashes, do the following:
text.replace(/[\W_-]/g,' ');
Upvotes: 3
Reputation: 2832
A saw a different post that also had diacritical marks, which is great
s.replace(/[^a-zA-Z0-9À-ž\s]/g, "")
Upvotes: 4
Reputation: 12389
Be aware, that \W
leaves the underscore. A short equivalent for [^a-zA-Z0-9]
would be [\W_]
text.replace(/[\W_]+/g," ");
\W
is the negation of shorthand \w
for [A-Za-z0-9_]
word characters (including the underscore)
Upvotes: 349
Reputation: 1541
Jonny 5 beat me to it. I was going to suggest using the \W+
without the \s
as in text.replace(/\W+/g, " ")
. This covers white space as well.
Upvotes: 150
Reputation: 89629
Since [^a-z0-9]
character class contains all that is not alnum, it contains white characters too!
text.replace(/[^a-z0-9]+/gi, " ");
Upvotes: 19
Reputation: 413976
Well I think you just need to add a quantifier to each pattern. Also the carriage-return thing is a little funny:
text.replace(/[^a-z0-9]+|\s+/gmi, " ");
edit The \s
thing matches \r
and \n
too.
Upvotes: 8