asb
asb

Reputation: 4432

Regex to join single chars

Hi I am trying to devise a regex to join any consecutive instances of single chars in a string. Let me give examples:

'A B C Industries' => 'ABC Industries'
'Industries A B C' => 'Industries ABC'
'Foo A B C Industries' => 'Foo ABC Industries'
'Foo A B C Industries X Y Z Product' => 'Foo ABC Industries XYZ Product'

etc.

The following are the two attempts I have made (both incomplete):

1)

''.join(r'(?<=\s\S)\s|(?<=^\S)\s')

2)

'\S+'.findall()

and then loop over the output.

Is there a regex that can do this in one fell swoop?

Upvotes: 1

Views: 469

Answers (2)

hwnd
hwnd

Reputation: 70732

You can use a combination of Lookahead and Lookbehind and use re.sub for replacement.

(?i)(?<=\b[a-z]) (?=[a-z]\b)

Explanation:

(?i)          # set flags for this block (case-insensitive)
(?<=          # look behind to see if there is:
  \b          #   the boundary between a word char (\w) and not a word char
  [a-z]       #   any character of: 'a' to 'z'
)             # end of look-behind
              # ' '
(?=           # look ahead to see if there is:
  [a-z]       #   any character of: 'a' to 'z'
  \b          #   the boundary between a word char (\w) and not a word char
)             # end of look-ahead

Example:

import re

s1 = 'A B C Industries'
s2 = 'Industries A B C'
s3 = 'Foo A B C Industries'
s4 = 'Foo A B C Industries X Y Z Product'
s5 = 'F O O B A R and b a z'

for s in [s1, s2, s3, s4, s5]:
    print re.sub(r'(?i)(?<=\b[a-z]) (?=[a-z]\b)', '', s)

Output:

ABC Industries
Industries ABC
Foo ABC Industries
Foo ABC Industries XYZ Product
FOOBAR and baz

Upvotes: 5

natronite
natronite

Reputation: 897

You can simply use the regex packages search and replace:

output = re.sub("(?<!\w{2}) (?!\w{2})", '', input)

This replaces spaces that ware surrounded by single characters.

Edit: I don't use \w+ because in python: 'look-behind requires fixed-width pattern'

Upvotes: 0

Related Questions