Reputation: 273
I need to split a list into sublists based on sequences of strings present in a text file as shown below (note that the sublists may not overlap and that you can't have a pattern that is a subset of another).
Patterns:
cat,dog
dog,cow
list = ['chicken','cat','dog','dog','cow','bat']
Output: [chicken,[cat,dog],[dog,cow],bat]
Certainly, I can do this in a native way using some list splits and merging back together again at the end (see edit2) but this is ugly, and I figured there must be a more pythonic way to do this. There are some similar questions that use itertools
or similar, but none of them are quite what I want (they mostly involve matching on a common characteristic, which isn't present here).
Edit: list items can occur multiple times (so a pattern may occur more than once). Patterns can also contain any number of elements > 1
Edit 2: Something like the following is the native approach I was thinking of. Note that I havent implemented this and it most likely has several issues (including that I'm not inserting at the proper indices in the second for loop), but I think it demonstrates the algorithm I thought of at first.
l1 = input list
list l2 = [100] #Just preseting the size for now
for pattern in patterns:
find(index where pattern starts in l1)
s = split list (start of pattern:end of pattern)
list[start of pattern] = s
for l in l2:
if l2[l] is empty:
l2[l] = l1[l]
Upvotes: 0
Views: 497
Reputation: 123463
The following seems like it meets your requirements, It uses the optional else
clause that for
loops can have to handle the case when none of the patterns matched (and the current element should just be copied to the result
).
patterns = ['cat', 'dog'], ['dog', 'cow']
elements = ['chicken', 'cat', 'dog', 'dog', 'cow', 'bat']
result = []
i = 0
while i < len(elements):
for pattern in patterns:
if pattern == elements[i: i+len(pattern)]:
result.append(pattern)
i += len(pattern)
break
else:
result.append(elements[i])
i += 1
print(result) # -> ['chicken', ['cat', 'dog'], ['dog', 'cow'], 'bat']
Upvotes: 1