unitedsaga
unitedsaga

Reputation: 111

Python/Panda string split - keeping the splitter (separator)

I am trying to split a character string into block of known substring (sub-character). I believe I can achieve this if I am able to keep the separator in the output.

e.g:

re.split('LBT', 'HLHLBTS')
['HLH', 'S'] #output
['HLH', 'LBT', 'S'] #Needed output

Eventual final output that I am looking for:

['H', 'HL', 'HLH', 'HLHLBT','HLHLBTS'] # Can be achieved if I have above

I have tried this and it gets me the end result but I have simply brute forced it this particular situation

seq = re.split('', 'HLHLBTS')
seqout = []
sout = []
s = ''
count = 0
cond = 'run'
for i in range(len(seq)):
    if count == 2:
        cond = 'run'
    if cond == 'skip':
        count = count + 1
        continue
    if ((seq[i] == 'L') & (i < len(seq) - 4)):
        if ((seq[i+1] == 'B') & (seq[i+2] == 'T')):
            w = 'LBT'
            cond = 'skip'
        else: 
            w = seq[i]
    else:
        w = seq[i]
    s = s+w
    sout.append(s)
seqout.append(sout)

Upvotes: 3

Views: 77

Answers (2)

Rob Raymond
Rob Raymond

Reputation: 31146

Use sub() to insert a dlimiter, then split()

import re

re.sub("^(.*)(LBT)(.*)$", r"\1|\2|\3", "HLHLBTS").split("|")

output

['HLH', 'LBT', 'S']

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can put ( ) around the first parameter (pattern) in re.split:

import re

seq = re.split(r"(LBT)", "HLHLBTS")
print(seq)

Prints:

['HLH', 'LBT', 'S']

Upvotes: 4

Related Questions