Spliting on every character except for preserved substring

Question

Given the string

word = "These"

that contains the tuple

pair = ("h", "e")

the aim is to replace the word such that it splits on all character except for the pair tuple, i.e. output:

('T', 'he', 's', 'e')

I've tried:

word = 'These'
pair = ('h', 'e')
first, second = pair
pair_str = ''.join(pair)
pair_str = pair_str.replace('\','\\')
pattern = re.compile(r'(?



Note that sometimes the pair tuple can contain slashes, backslashes or any other escape characters, thus the replace and escape in the above regex.

Is there a simpler way to achieve the same string replacement?



EDITED

Specifics from comments:


  And is there a distinction between when both characters in the pair are unique and when they aren't?


Nope, they should be treated the same way.

Ry- · Accepted Answer

Match instead of splitting:

pattern = re.escape(''.join(pair)) + '|.'
result = tuple(re.findall(pattern, word))

The pattern is |., which matches the pair if possible and a single character* otherwise.

You can also do this without regular expressions:

import itertools

non_pairs = word.split(''.join(pair))
result = [(''.join(pair),)] * (2 * len(non_pairs) - 1)
result[::2] = non_pairs
result = tuple(itertools.chain(*result))

^{* It doesn’t match newlines, though; if you have those, pass re.DOTALL as a third argument to re.findall.}

Spliting on every character except for preserved substring

EDITED

Answers (2)

Related Questions