DanSingerman
DanSingerman

Reputation: 36522

How do I write a regex to match all single characters in a string?

I want to replace all single non-whitespace characters from a string with a space.

I have tried this.

string = 'This is a test string'
string.gsub(/(\W|\A).(\W|\z)/, ' ')
 => "This is test string" 

Which works great. But if I have two consecutive single characters, it only finds the first.

string = 'This is a x test string'
string.gsub(/(\W|\A).(\W|\z)/, ' ')
 => "This is x test string"  

I am not sure which regex principle I am missing here that I need to make this work. Any ideas?

Upvotes: 1

Views: 1485

Answers (5)

hirolau
hirolau

Reputation: 13921

And here is a non-regexp version:

string = 'This is x a test string'

single_character = -> x { x.size == 1 } 

p string.split(' ').reject(&single_character).join(' ') #=> "This is test string"

Upvotes: 1

SamWhan
SamWhan

Reputation: 8332

If I understand you correctly, you want to remove single instances of non-whitespace. Try replacing

\s\S(?!\S)|(?<!\S)\S\s

with nothing - "".

Se an example here at regex101.

Upvotes: 0

user2705585
user2705585

Reputation:

Regex principle in use here is word boundary.

Try with \b[A-Za-z]\b Regex101 Demo

This will work most of the time except if there is some other character than word. Such as a@ then it will consider a as single character because there is a word boundary between a and @ like this a|@.

In that case you can also go with look-around assertions which will look for space on both side of letter. So as to qualify as single character.

Regex: (?<=\s)[A-Za-z](?=\s) Regex101 Demo


Update #1:

For non-whitespace character use \S or [^\s] in search pattern.

Regex will be (?<=\s)[^\s](?=\s) Or (?<=\s)\S(?=\s) Regex101 Demo


Update #2:

To match at beginning or at end of string, added ^ and $ into lookaround assertions.

Regex: (?<=^|\s)[^\s](?=\s|$) Regex101 Demo

Note:- Use used \A and \z instead of ^ and $ if latter doesn't works.

Upvotes: 5

riteshtch
riteshtch

Reputation: 8769

You can use word boundary \b like this:

string = 'This­ is a x y z test strin­g'
string.gsu­b(/\b\w\b/­, ' ').gs­ub(/\s{2,}­/, ' ')
=> "This is test string"

Others characters can be used with char classes like this: [\w\-] or a not space char like this: (?<=\s)\S(?=\s)

Upvotes: 0

Martin Svalin
Martin Svalin

Reputation: 2267

You can use positive lookahead (or lookbehind). Then the space before (or after with lookbehind) wont be included in the match, and you replace with the empty string.

string = 'This is a x test string'
string.gsub(/(?<=\W|\A).(\W|\z)/, '')
=> "This is test string"

I'd restrict the character matched in between to a \w, and maybe move to unicode aware character classes.

Upvotes: 0

Related Questions