Reputation: 4432
Hi I am trying to devise a regex to join any consecutive instances of single chars in a string. Let me give examples:
'A B C Industries' => 'ABC Industries'
'Industries A B C' => 'Industries ABC'
'Foo A B C Industries' => 'Foo ABC Industries'
'Foo A B C Industries X Y Z Product' => 'Foo ABC Industries XYZ Product'
etc.
The following are the two attempts I have made (both incomplete):
1)
''.join(r'(?<=\s\S)\s|(?<=^\S)\s')
2)
'\S+'.findall()
and then loop over the output.
Is there a regex that can do this in one fell swoop?
Upvotes: 1
Views: 469
Reputation: 70732
You can use a combination of Lookahead and Lookbehind and use re.sub
for replacement.
(?i)(?<=\b[a-z]) (?=[a-z]\b)
Explanation:
(?i) # set flags for this block (case-insensitive)
(?<= # look behind to see if there is:
\b # the boundary between a word char (\w) and not a word char
[a-z] # any character of: 'a' to 'z'
) # end of look-behind
# ' '
(?= # look ahead to see if there is:
[a-z] # any character of: 'a' to 'z'
\b # the boundary between a word char (\w) and not a word char
) # end of look-ahead
Example:
import re
s1 = 'A B C Industries'
s2 = 'Industries A B C'
s3 = 'Foo A B C Industries'
s4 = 'Foo A B C Industries X Y Z Product'
s5 = 'F O O B A R and b a z'
for s in [s1, s2, s3, s4, s5]:
print re.sub(r'(?i)(?<=\b[a-z]) (?=[a-z]\b)', '', s)
Output:
ABC Industries
Industries ABC
Foo ABC Industries
Foo ABC Industries XYZ Product
FOOBAR and baz
Upvotes: 5
Reputation: 897
You can simply use the regex packages search and replace:
output = re.sub("(?<!\w{2}) (?!\w{2})", '', input)
This replaces spaces that ware surrounded by single characters.
Edit: I don't use \w+ because in python: 'look-behind requires fixed-width pattern'
Upvotes: 0