Tomek Sztuk
Tomek Sztuk

Reputation: 69

How to join certain items in list

My list looks like this :

['', 'CCCTTTCGCGACTAGCTAATCTGGCATTGTCAATACAGCGACGTTTCCGTTACCCGGGTGCTGACTTCATACTT
CGAAGA', 'ACCGGGCCGCGGCTACTGGACCCATATCATGAACCGCAGGTG', '', '', 'AGATAAGCGTATCACG
ACCTCGTGATTAGCTTCGTGGCTACGGAAGACCGCAACAGGCCGCTCTTCTGATAAGTGTGCGG', '', '', 'ATTG
TCTTACCTCTGGTGGCATTGCAACAATGCAAATGAGAGTCACAAGATTTTTCTCCGCCCGAGAATTTCAAAGCTGT', '
TGAAGAGAGGGTCGCTAATTCGCAATTTTTAACCAAAAGGCGTGAAGGAATGTTTGCAGCTACGTCCGAAGGGCCACATA
', 'TTTTTTTAGCACTATCCGTAAATGGAAGGTACGATCCAGTCGACTAT', '', '', 'CCATGGACGGTTGGGGG
CCACTAGCTCAATAACCAACCCACCCCGGCAATTTTAACGTATCGCGCGGATATGTTGGCCTC', 'GACAGAGACGAGT
TCCGGAACTTTCTGCCTTCACACGAGCGGTTGTCTGACGTCAACCACACAGTGTGTGTGCGTAAATT', 'GGCGGGTGT
CCAGGAGAACTTCCCTGAAAACGATCGATGACCTAATAGGTAA', '']

Those are sample DNA sequences read from a file. The list can have various length, and one sequence can have 10 as well as 10,000 letters. In a source file, they are delimited by empty lines, hence empty items in list. How can I join all items in between empty ones ?

Upvotes: 2

Views: 1297

Answers (1)

Óscar López
Óscar López

Reputation: 236004

Try this, it's a quick and dirty solution that works fine, but won't be efficient if the input list is really big:

lst = ['GATTACA', 'etc']
[x for x in ''.join(',' if not e else e for e in lst).split(',') if x]

This is how it works, using generator expressions and list comprehensions from the inside-out:

  • ',' if not e else e for e in lst : replace all '' strings in the list with ','
  • ''.join(',' if not e else e for e in lst) : join together all the strings. Now the spaces between sequences will be separated by one or more ,
  • ''.join(',' if not e else e for e in lst).split(',') : split the string at the points where there are , characters, this produces a list
  • [x for x in ''.join(',' if not e else e for e in lst).split(',') if x] : finally, remove the empty strings, leaving a list of sequences

Alternatively, the same functionality could be written in a longer way using explicit loops, like this:

answer  = [] # final answer
partial = [] # partial answer
for e in lst:
    if e == '':           # if current element is an empty string … 
        if partial:       # … and there's a partial answer
            answer.append(''.join(partial)) # join and append partial answer
            partial = []  # reset partial answer
    else:                 # otherwise it's a new element of partial answer
        partial.append(e) # add it to partial answer
else:                     # this part executes after the loop exits
    if partial:           # if one partial answer is left
        answer.append(''.join(partial)) # add it to final answer

The idea is the same: we keep track of the non empty-strings and accumulate them, and whenever an empty string is found, we add all the accumulated values to the answer, taking care of adding the last sublist after the loop ends. The result ends up in the answer variable, and this solution only makes a single pass across the input.

Upvotes: 2

Related Questions