Reputation: 13
I'm trying to append lines to an empty list reading from a file, and I've already stripped the lines of returns and newlines, but what should be one line is being entered as two separate items into the list.
DNA = open('DNAGCex.txt')
DNAID = []
DNASEQ = []
for line in DNA:
line = line.rstrip()
line = line.lstrip()
if line.startswith('>')==True:
DNAID.append(line)
if line.startswith('>')==False:
DNASEQ.append(line)
print DNAID
print DNASEQ
And here's the output
['>Rosalind_6404', '>Rosalind_5959', '>Rosalind_0808'] ['CCTGCGGAAGATCGGCACTAGA', 'TCCCACTAATAATTCTGAGG', 'CCATCGGTAGCGCATCCTTAGTCCA', 'ATATCCATTTGTCAGCAGACACGC', 'CCACCCTCGTGGTATGGCTAGGCATTCAG', 'TGGGAACCTGCGGGCAGTAGGTGGAAT']
I want it to look like this:
['>Rosalind_6404', '>Rosalind_5959', '>Rosalind_0808'] ['CCTGCGGAAGATCGGCACTAGATCCCACTAATAATTCTGAGG', 'CCATCGGTAGCGCATCCTTAGTCCAATATCCATTTGTCAGCAGACACGC', 'CCACCCTCGTGGTATGGCTAGGCATTCAGTGGGAACCTGCGGGCAGTAGGTGGAAT']
Here is the source material, just remove the ''s:
['>Rosalind_6404' CCTGCGGAAGATCGGCACTAGA TCCCACTAATAATTCTGAGG '>Rosalind_5959' CCATCGGTAGCGCATCCTTAGTCCA ATATCCATTTGTCAGCAGACACGC '>Rosalind_0808' CCACCCTCGTGGTATGGCTAGGCATTCAG TGGGAACCTGCGGGCAGTAGGTGGAAT]
Upvotes: 1
Views: 103
Reputation: 13158
You can combine the .lstrip()
and .rstrip()
into a single .strip()
call.
Then, you were thinking that .append()
both added lines to a list and joined lines into a single line. Here, we start DNASEQ
with an empty string and use +=
to join the lines into a long string:
DNA = open('DNAGCex.txt')
DNAID = []
DNASEQ = []
for line in DNA:
line = line.strip()
if line.startswith('>'):
DNAID.append(line)
DNASEQ.append('')
else:
DNASEQ[-1] += line
print DNAID
print DNASEQ
Upvotes: 1
Reputation: 49310
Within each iteration of the loop, you're only looking at a certain line from the file. This means that, although you certainly are appending lines that don't contain a linefeed at the end, you're still appending one of the file's lines at a time. You'll have to let the interpreter know that you want to combine certain lines, by doing something like setting a flag when you first start to read in a DNASEQ and clearing it when the next DNAID starts.
for line in DNA:
line = line.strip() # gets both sides
if line.startswith('>'):
starting = True
DNAID.append(line)
elif starting:
starting = False
DNASEQ.append(line)
else:
DNASEQ[-1] += line
Upvotes: 1