Reputation: 69
I have two files:
file A:
U1
U2
U3
file B:
U1hjg 444 77 AGT
U8jha 777 33 AKS
U2jsj 772 00 AKD
U55sks 888 02 SJD
U3jsj 666 32 JSJ
Then I have two lists:
listA=open("file A").readlines()
listB=open("file B").readlines()
And I would like to check for each member of listA if it is present in List B, and print two files: one with the file B with matches (ordered by listA), and the other one with fileB without matches. Desired output:
file list_match:
U1hjg 444 77 AGT
U2jsj 772 00 AKD
U3jsj 666 32 JSJ
file list_unmatched:
U8jha 777 33 AKS
U55sks 888 02 SJD
I am a very beginner so I started trying this as an example:
print(ListA[1])
print(ListB[2])
if ListA[1] in ListB[2]:
print("yes")
And the output is:
U2
U2jsj 772 00 AKD
But the "yes" is not printed
But if I do:
if "U2" in ListB[2]:
print("yes")
The output is:
yes
I do not understand where the error is. Could someone plese help me?
Upvotes: 1
Views: 54
Reputation: 180391
st = set(list_b)
matches = ([line for line in list_a if line in st])
To get both:
# with will close your file automatically
with open("file A") as f1 ,open("file B") as f2:
st = set(f2) # get set of all lines in file b
matches = []
diff = []
for line in f1: # iterate over every line in file a
if line in st: # if line is in file b add it to our matches
matches.append(line)
else: # else add it to our diff list
diff.append(line)
If you want to create two new files instead of appending to lists just write the lines.
with open("file A") as f1,open("file B") as f2,open("matches.txt","w") as mat ,open("diff.txt","w") as diff:
st = set(f1)
for line in f2:
if line in st:
mat.write(line)
else:
diff.write(line)
You just need ListA[1].rstrip() in ListB[2]
in your own example. There is a newline character at the end of ListA[1]
and all lines excluding the last.
If you print(repr(ListA[1]))
you will see exactly what is there.
Printing our set and each line as we iterate you can see the newlines at the end:
{'U2\n', 'U3', 'U1\n'} <-st
# print(repr(line)) on all lines from fileB
'file B:\n'
'U1hjg 444 77 AGT\n'
'U8jha 777 33 AKS\n'
'U2jsj 772 00 AKD\n'
'U55sks 888 02 SJD\n'
'U3jsj 666 32 JSJ'
Upvotes: 2
Reputation: 76847
This happens because readlines()
gives you the line with the \n
terminating character. Hence, when you do
if ListA[1] in ListB[2]:
print("yes")
you are essentially checking if "U2\n"
is in "U2jsj 772 00 AKD\n"
, which returns False. But since "U2"
is in fact present, it prints "yes"
when you use the literal.
You can verify the same in sample program below:
$ cat text.txt
Sample
Text
Here.
$ cat test.py
with open("text.txt", "r") as f:
text = f.readlines()
print text
print text[0]
$ python test.py
['Sample\n', 'Text\n', 'Here.\n']
Sample
$ #prompt
To correct this, if your file sizes are huge, strip the lines using ListA[1].rstrip()
.
Else, you can use .read()
and split on "\n"
, create a list, and use a custom list comprehension method:
with open("file A") as f1 ,open("file B") as f2:
s1 = f1.read().split("\n")
s2 = f2.read().split("\n")
with open("matching.txt","w") as match, open("non-matching.txt","w") as no_match:
matching = [x for x in s2 for y in s1 if y in x]
non_matching = [x for x in s2 for y in s1 if y not in x]
for line in matching:
match.write(line)
for line in non_matching:
no_match.write(line)
Upvotes: 1