larosamatt
larosamatt

Reputation: 13

Python: List not correct after appending lines

I'm trying to append lines to an empty list reading from a file, and I've already stripped the lines of returns and newlines, but what should be one line is being entered as two separate items into the list.

DNA = open('DNAGCex.txt')
DNAID = []
DNASEQ = []
for line in DNA:
    line = line.rstrip()
    line = line.lstrip()
    if line.startswith('>')==True:
        DNAID.append(line)
    if line.startswith('>')==False:
        DNASEQ.append(line)
print DNAID
print DNASEQ

And here's the output

['>Rosalind_6404', '>Rosalind_5959', '>Rosalind_0808'] ['CCTGCGGAAGATCGGCACTAGA', 'TCCCACTAATAATTCTGAGG', 'CCATCGGTAGCGCATCCTTAGTCCA', 'ATATCCATTTGTCAGCAGACACGC', 'CCACCCTCGTGGTATGGCTAGGCATTCAG', 'TGGGAACCTGCGGGCAGTAGGTGGAAT']

I want it to look like this:

['>Rosalind_6404', '>Rosalind_5959', '>Rosalind_0808'] ['CCTGCGGAAGATCGGCACTAGATCCCACTAATAATTCTGAGG', 'CCATCGGTAGCGCATCCTTAGTCCAATATCCATTTGTCAGCAGACACGC', 'CCACCCTCGTGGTATGGCTAGGCATTCAGTGGGAACCTGCGGGCAGTAGGTGGAAT']

Here is the source material, just remove the ''s:

['>Rosalind_6404' CCTGCGGAAGATCGGCACTAGA TCCCACTAATAATTCTGAGG '>Rosalind_5959' CCATCGGTAGCGCATCCTTAGTCCA ATATCCATTTGTCAGCAGACACGC '>Rosalind_0808' CCACCCTCGTGGTATGGCTAGGCATTCAG TGGGAACCTGCGGGCAGTAGGTGGAAT]

Upvotes: 1

Views: 103

Answers (2)

Brent Washburne
Brent Washburne

Reputation: 13158

You can combine the .lstrip() and .rstrip() into a single .strip() call.

Then, you were thinking that .append() both added lines to a list and joined lines into a single line. Here, we start DNASEQ with an empty string and use += to join the lines into a long string:

DNA = open('DNAGCex.txt')
DNAID = []
DNASEQ = []
for line in DNA:
    line = line.strip()
    if line.startswith('>'):
        DNAID.append(line)
        DNASEQ.append('')
    else:
        DNASEQ[-1] += line
print DNAID
print DNASEQ

Upvotes: 1

TigerhawkT3
TigerhawkT3

Reputation: 49310

Within each iteration of the loop, you're only looking at a certain line from the file. This means that, although you certainly are appending lines that don't contain a linefeed at the end, you're still appending one of the file's lines at a time. You'll have to let the interpreter know that you want to combine certain lines, by doing something like setting a flag when you first start to read in a DNASEQ and clearing it when the next DNAID starts.

for line in DNA:
    line = line.strip() # gets both sides
    if line.startswith('>'):
        starting = True
        DNAID.append(line)
    elif starting:
        starting = False
        DNASEQ.append(line)
    else:
        DNASEQ[-1] += line

Upvotes: 1

Related Questions