How to merge items in a list, based on the first two characters of the next item in the list

Question

I have this code, which is inspired by others, which is now succesfully merging items starting with '##' with the previous item in the list. However I have weird behaviour, where the last item is disappearing.

List:

tokens = ['Hello', 'this', 'is', 'a', 's', '##e', '##ntenc', '##e']

Checking if something is a subtoken (which has ##)

def is_subtoken(string):
    if string[:2] == "##":
        return True
    else:
        return False

Merging the tokens

merged_text = []
for i in range(len(tokens)):
    if not is_subtoken(tokens[i]) and (i+1)


This is the output:
['Hello', 'this', 'is', 'a', 'sentenc']

Whereas was expected:
['Hello', 'this', 'is', 'a', 'sentence']

I can't get my head around it. Is there something missing needed to merge a multitude of these '##' items?
Thank you very much.

acushner · Accepted Answer

you could just use join, replace, and split pretty easily:

'|'.join(tokens).replace('|##', '').split('|')

edit: you're missing the last element because you never add it unless it's not a token

How to merge items in a list, based on the first two characters of the next item in the list

Answers (2)

Related Questions