mr.SomeBody
mr.SomeBody

Reputation: 13

Split string with two words

Is there a way to split a string on either of two delimiters.

Code example:

# word --> u1 or word --> u2
a = "Hi thereu1hello ?u1Whatu2Goodu1Work worku2Stacku2"
# here we must split this string with two words "u1" && "u2" and insert them in 2 list like this
u1 = ["Hi there", "hello ?", "Good"]
u2 = ["What", "Work work", "Stack"]

Upvotes: 1

Views: 2040

Answers (2)

Patrick Artner
Patrick Artner

Reputation: 51643

You can iterate the string character-wise and accumulate characters in a part-list until your last char in that list is 'u' and your current char is '1' or '2'.

You then join the part-list together again, omitting its last character (the 'u') and stuff it either in u1 or u2 and clear part:

a = "Hi thereu1hello ?u1Whatu2Goodu1Work worku2Stacku2"

u1 = []
u2 = []
part = []

# iterate your string character-wise
for c in a:
    # last character collected == u and now 1 or 2?
    if part and part[-1] == "u" and c in ["1","2"]:
        if c == "1":
            u1.append(''.join(part[:-1])) # join all collected chars, omit 'u'
            part=[]
        else:
            u2.append(''.join(part[:-1])) # see above, same.
            part=[]
    else:
        part.append(c)

# you have no end-condition if your string ends on neither u1 nor u2 the
# last part of your string is not added to any u1 or u2

print(u1)    
print(u2)

Output:

['Hi there', 'hello ?', 'Good']
['What', 'Work work', 'Stack']

Second way to go would be to remembers certain indexes (where ended last slice, where are we now) and just slice the correct part from the input:

u1 = []
u2 = [] 

oldIdx = 0     # where to start slicing, update on append to either u1 or u2
lastOne = ""   # character in last iteration

for i,c in enumerate(a):  # get the index (i) and the character (c) from enumerate

    if lastOne == "u" and c in ["1","2"]:
        if c == "1":
            u1.append(a[oldIdx:i-1]) # slice the correct part from a            
        else:
            u2.append(a[oldIdx:i-1]) # slice the correct part from a

        oldIdx = i+1  # update slice starting position
        lastOne = ""  # reset last one
    else:
        lastOne = c   # remeber char as lastOne

You do not need as much "memory/time" to store a single integer and a character as you need when storing / appending to a part list - you also do not need to join the parts for appending as you directly slice from the source - so its slightly more efficient.

Upvotes: 2

Aran-Fey
Aran-Fey

Reputation: 43136

You can use regex to implement a trivially extensible solution:

import re

a = "Hi thereu1hello ?u1Whatu2Goodu1Work worku2Stacku2"
separators = ['u1', 'u2']

regex = r'(.*?)({})'.format('|'.join(re.escape(sep) for sep in separators))
result = {sep: [] for sep in separators}
for match in re.finditer(regex, a, flags=re.S):
    text = match.group(1)
    sep = match.group(2)
    result[sep].append(text)

print(result)
# {'u1': ['Hi there', 'hello ?', 'Good'],
#  'u2': ['What', 'Work work', 'Stack']}

This constructs a regex out of the separators u1 and u2 like so:

(.*?)(u1|u2)

And then it iterates over all matches of this regex and appends them to the corresponding list.

Upvotes: 1

Related Questions