Reputation: 69
My list looks like this :
['', 'CCCTTTCGCGACTAGCTAATCTGGCATTGTCAATACAGCGACGTTTCCGTTACCCGGGTGCTGACTTCATACTT
CGAAGA', 'ACCGGGCCGCGGCTACTGGACCCATATCATGAACCGCAGGTG', '', '', 'AGATAAGCGTATCACG
ACCTCGTGATTAGCTTCGTGGCTACGGAAGACCGCAACAGGCCGCTCTTCTGATAAGTGTGCGG', '', '', 'ATTG
TCTTACCTCTGGTGGCATTGCAACAATGCAAATGAGAGTCACAAGATTTTTCTCCGCCCGAGAATTTCAAAGCTGT', '
TGAAGAGAGGGTCGCTAATTCGCAATTTTTAACCAAAAGGCGTGAAGGAATGTTTGCAGCTACGTCCGAAGGGCCACATA
', 'TTTTTTTAGCACTATCCGTAAATGGAAGGTACGATCCAGTCGACTAT', '', '', 'CCATGGACGGTTGGGGG
CCACTAGCTCAATAACCAACCCACCCCGGCAATTTTAACGTATCGCGCGGATATGTTGGCCTC', 'GACAGAGACGAGT
TCCGGAACTTTCTGCCTTCACACGAGCGGTTGTCTGACGTCAACCACACAGTGTGTGTGCGTAAATT', 'GGCGGGTGT
CCAGGAGAACTTCCCTGAAAACGATCGATGACCTAATAGGTAA', '']
Those are sample DNA sequences read from a file. The list can have various length, and one sequence can have 10 as well as 10,000 letters. In a source file, they are delimited by empty lines, hence empty items in list. How can I join all items in between empty ones ?
Upvotes: 2
Views: 1297
Reputation: 236004
Try this, it's a quick and dirty solution that works fine, but won't be efficient if the input list is really big:
lst = ['GATTACA', 'etc']
[x for x in ''.join(',' if not e else e for e in lst).split(',') if x]
This is how it works, using generator expressions and list comprehensions from the inside-out:
',' if not e else e for e in lst
: replace all ''
strings in the list with ','
''.join(',' if not e else e for e in lst)
: join together all the strings. Now the spaces between sequences will be separated by one or more ,
''.join(',' if not e else e for e in lst).split(',')
: split the string at the points where there are ,
characters, this produces a list[x for x in ''.join(',' if not e else e for e in lst).split(',') if x]
: finally, remove the empty strings, leaving a list of sequencesAlternatively, the same functionality could be written in a longer way using explicit loops, like this:
answer = [] # final answer
partial = [] # partial answer
for e in lst:
if e == '': # if current element is an empty string …
if partial: # … and there's a partial answer
answer.append(''.join(partial)) # join and append partial answer
partial = [] # reset partial answer
else: # otherwise it's a new element of partial answer
partial.append(e) # add it to partial answer
else: # this part executes after the loop exits
if partial: # if one partial answer is left
answer.append(''.join(partial)) # add it to final answer
The idea is the same: we keep track of the non empty-strings and accumulate them, and whenever an empty string is found, we add all the accumulated values to the answer, taking care of adding the last sublist after the loop ends. The result ends up in the answer
variable, and this solution only makes a single pass across the input.
Upvotes: 2