UserYmY
UserYmY

Reputation: 8564

How to compare one text file with several others with Python?

I have written the code below to compare a file "(F) with several other files that are in my path. Now the result only prints the result of one file. Any suggestion how to perform the comparison and print all of the results?

import difflib
import fnmatch
import os

filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):  
    for filename in fnmatch.filter(filenames, '*.txt'):   
        filelist.append(os.path.join(root, filename))

for m in filelist:
    g=open(m,'r')
    glines= g.readlines()
   # g.close()
    d = difflib.Differ()
    diff_list = list(d.compare(flines, glines))

#print("".join(diff))
n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0

for diff_item in diff_list:
    if diff_item[0] == '+':
        n_adds += 1
    elif diff_item[0] == '-':
        n_subs +=1 
    elif diff_item[0] == ' ':
        n_eqs += 1
    else: 
        n_wiered += 1

print 'lines files #1: %d  #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d '  % (n_adds, n_subs, n_eqs, n_wiered)

Upvotes: 0

Views: 108

Answers (2)

jbat100
jbat100

Reputation: 16827

If you just want to compare the files you can use filecmp.cmp. It will avoid having to read all the content in with readlines. Documentation:

filecmp.cmp(f1, f2[, shallow])

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal. Files that were compared using this function will not be compared again unless their os.stat() signature changes. Note that no external programs are called from this function, giving it portability and efficiency.

Also to explore all the file combinations you can use itertools.combinations (with r=2):

itertools.combinations(iterable, r)

Return r length subsequences of elements from the input iterable.

Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.

Upvotes: 1

Vikram Saran
Vikram Saran

Reputation: 1143

diff_list is overridden with each file read.

Try appending to diff_list rather than overwriting it with this line:

diff_list = list(...)

Upvotes: 2

Related Questions