Brian O'Driscoll
Brian O'Driscoll

Reputation: 31

How to unpunctuate, lowercase, de-space and hyphenate a string with regex?

If I have a string like this

Newsflash: The Big(!) Brown Dog's Brother (T.J.) Ate The Small Blue Egg

how would I convert that into the following using regex:

newsflash-the-big-brown-dogs-brother-tj-ate-the-small-blue-egg

In other words, punctuation is discarded and spaces are replaced with hyphens.

Upvotes: 3

Views: 3741

Answers (3)

Mike Clark
Mike Clark

Reputation: 10136

It sounds like you want to create a "URL plug" -- a URL-friendly version of an article's title, for example. That means you'll want to make sure you remove all possible non-URL-friendly characters, not just a few. You might do it this way (in order):

Remove all non-letter non-number non-space characters by:
Replacing regex [^A-Za-z0-9 ] with the empty string "".

Replace all spaces with a dash by:
Replacing regex \s+ with the string "-".

Lower-case the string by:
Java s = s.toLowerCase();
JavaScript s = s.toLowerCase();
C# s = s.ToLowerCase();
Perl $s = lc($s);
Python s = s.lower()
PHP $s = strtolower($s);
Ruby s = s.downcase

Upvotes: 5

moeffju
moeffju

Reputation: 4403

Replace /\W+/ with '-', that will replace all non-word characters with a dash.

Then, collapse dashes by replacing /-+/ with '-'.

Then, lowercase the string - pure regex solutions cannot do that. You didn't say which language you are using, so I cannot give you an example, but your language might have String.toLowercase() or a tr/// call (tr/A-Z/a-z/, for example, in Perl).

Upvotes: 0

SLaks
SLaks

Reputation: 887817

Replace the regex [\s-]+ with "-", then replace [^\w-] with "".

Then, call ToLowerCase or equivalent.

In Javascript:

var s = "Newsflash: The Big(!) Brown Dog's Brother (T.J.) Ate The Small Blue Egg";
alert(s.replace(/[\s+-]/g, '-').replace(/[^\w-]/g, '').toLowerCase());

Upvotes: 1

Related Questions