Reputation: 3
I am trying to search names from file 1 in file 2 and merge some data on matched lines
file1:
A 28 sep 1980
B 28 jan 1985
C 25 feb 1990
D 27 march 1995
and file2
A hyd
B alig
C slg
D raj
Using this:
import sys
data1 = open(sys.argv[1]).read().rstrip('\n')
data2 = open(sys.argv[2]).read().rstrip('\n')
list1 = data1.split('\n')
list2 = data2.split('\n')
for line in list1:
for item in list2:
if line.split('\t')[0] in item.split('\t')[0]:
print(item,'\t',line.split('\t')[3])
Result:
A hyd 1980
B alig 1985
C slg 1990
D raj 1995
Two questions (for clarifying the concept):
1 - I was hoping that if I change the order of lines in file2, I should get smaller number of matches but I still get all the matches. Why?
2- Although this program serves the purpose, how memory efficient it is expected to be? please suggest.
Thanks
Upvotes: 0
Views: 205
Reputation: 799024
1 - I was hoping that if I change the order of lines in file2, I should get smaller number of matches but I still get all the matches. Why?
Your program does a full cross-join of all lines, therefore you will always get full results.
2- Although this program serves the purpose, how memory efficient it is expected to be? please suggest.
Awful. Read only the shortest file into memory and iterate over the lines of the longer one once.
with open('bigfile.txt', 'r') as bigfile:
for bigline in bigfile:
for littleline in littlefiledata:
...
Upvotes: 1