Reputation: 122112
Given the string
word = "These"
that contains the tuple
pair = ("h", "e")
the aim is to replace the word
such that it splits on all character except for the pair
tuple, i.e. output:
('T', 'he', 's', 'e')
I've tried:
word = 'These'
pair = ('h', 'e')
first, second = pair
pair_str = ''.join(pair)
pair_str = pair_str.replace('\\','\\\\')
pattern = re.compile(r'(?<!\S)' + re.escape(first + ' ' + second) + r'(?!\S)')
new_word = ' '.join(word)
new_word = pattern.sub(pair_str, new_word)
result = tuple(new_word.split())
Note that sometimes the pair
tuple can contain slashes, backslashes or any other escape characters, thus the replace and escape in the above regex.
Is there a simpler way to achieve the same string replacement?
Specifics from comments:
And is there a distinction between when both characters in the pair are unique and when they aren't?
Nope, they should be treated the same way.
Upvotes: 2
Views: 80
Reputation: 15310
You can do it without using regular expressions:
import functools
word = 'These here when she'
pair = ('h', 'e')
digram = ''.join(pair)
parts = map(list, word.split(digram))
lex = lambda pre,post: post if pre is None else pre+[digram]+post
print(functools.reduce(lex, parts, None))
Upvotes: 1
Reputation: 224942
Match instead of splitting:
pattern = re.escape(''.join(pair)) + '|.'
result = tuple(re.findall(pattern, word))
The pattern is <pair>|.
, which matches the pair if possible and a single character* otherwise.
You can also do this without regular expressions:
import itertools
non_pairs = word.split(''.join(pair))
result = [(''.join(pair),)] * (2 * len(non_pairs) - 1)
result[::2] = non_pairs
result = tuple(itertools.chain(*result))
* It doesn’t match newlines, though; if you have those, pass re.DOTALL
as a third argument to re.findall
.
Upvotes: 3