Reputation: 151
I have two files, and I am trying to append the strings from the last column of the second file to an array within an array containing information in the first file. I want these strings to append only if the numbers in the second column of the second file fall between the numbers of the first and second columns of the first file.
Here are my files:
reads.bed:
chromA 10 69 read1
chromA 10 35 read2
chromA 10 55 read3
chromA 15 69 read4
chromA 80 119 read5
chromA 80 111 read6
chromA 90 119 read7
chromA 101 119 read8
feats.bed:
chromA 10 19 feat1
chromA 30 39 feat2
chromA 50 69 feat3
chromA 80 89 feat4
chromA 100 119 feat5
Here is my code:
feat_bed=open("feats.bed","r")
read_bed=open("reads.bed","r")
read_coords=[]
for line in read_bed.readlines():
line=line.strip()
line=line.split("\t")
read_coords.append([int(line[1]),int(line[2]),str(line[3]),[]])
for read in read_coords:
for feat in feat_bed.readlines():
feat=feat.strip()
feat=feat.split("\t")
if int(read[1]) > int(feat[1]) >= int(read[0]):
read[3].append(str(feat[3]))
print read
My expected output would be:
[10, 69, 'read1', ['feat1', 'feat2', 'feat3']]
[10, 35, 'read2', ['feat1', 'feat2']]
[10, 55, 'read3', ['feat1', 'feat2', 'feat3']]
[15, 69, 'read4', ['feat2', 'feat3']]
[80, 119, 'read5', ['feat4', 'feat5']]
[80, 111, 'read6', ['feat4', 'feat5']]
[90, 119, 'read7', ['feat5']]
[101, 119, 'read8', []]
Instead, my inner for loop seems to iterate only the first time, and then it stops, so my actual output is:
[10, 69, 'read1', ['feat1', 'feat2', 'feat3']]
[10, 35, 'read2', []]
[10, 55, 'read3', []]
[15, 69, 'read4', []]
[80, 119, 'read5', []]
[80, 111, 'read6', []]
[90, 119, 'read7', []]
[101, 119, 'read8', []]
I don't understand why my inner loop stops iterating after the first iteration of my outer loop. If someone could point out what I'm doing wrong that would be super helpful. Thanks.
Upvotes: 1
Views: 74
Reputation: 19414
This happens because readlines()
reads all lines from the current position in the file. So after the first call to readlines
, the file pointer is at the end of the file and all subsequent calls to readlines()
will return an empty list.
You want to save the lines to a list beforehand, like feat_lines = feat_bed.readlines()
and then iterate on that pre-saved list of lines like: for feat in feat_lines:
.
Upvotes: 1
Reputation: 1837
Using inner loops with identation:
feat_bed=open("feats.bed","r")
read_bed=open("reads.bed","r")
read_coords=[]
for line in read_bed.readlines():
line=line.strip()
line=line.split("\t")
read = [int(line[1]),int(line[2]),str(line[3]),[]]
for feat in feat_bed.readlines():
feat=feat.strip()
feat=feat.split("\t")
if int(read[1]) > int(feat[1]) >= int(read[0]):
read[3].append(str(feat[3]))
print read
Upvotes: 0