Reputation: 57
I need to compare the file names of two directories A and B.
A contains more files than B (around 15000/20000 respectively) with same name/different content.
I have:
dirA: 'doctor_Weiss.csv', 'doctor_Urlici.csv', 'doctor_Basler J. Rudolph.csv'
dirB: 'doctor_Weiss.csv', 'doctor_Urlici.csv'
I need all the files in dirA-dirB (from dirA):
diffAB: 'doctor_Basler J. Rudolph.csv'
I tried:
import os
from os.path import join
fpA = {}
for root, dirs, files in os.walk('C:\A\docs'):
for name in files:
fpA[name] = 1
fpB = {}
for root, dirs, files in os.walk('C:\B\docs'):
for name in files:
fpB[name] = 1
a = []
for name in fpA.keys():
if not(name in fpB.keys()):
a.append(name)
Didn't work. 'a' contains all the files from B and not just A-B.
I also tried to traverse both directories and create sets of files but did not work either (again all the files from B).
Thanks for your help
Upvotes: 2
Views: 184
Reputation: 7303
You need to escape backslashes in path names! And - as suggested - rename fpa
to fpA
and fpb
to fpB
. Then your example will work.
import os
from os.path import join
fpA = {}
for root, dirs, files in os.walk('C:\\A\\docs'): # <- escape backslash
for name in files:
fpA[name] = 1
fpB = {}
for root, dirs, files in os.walk('C:\\B\\docs'): # <- escape backslash
for name in files:
fpB[name] = 1
a = []
for name in fpA.keys():
if not(name in fpB.keys()):
a.append(name)
Upvotes: 0
Reputation: 4670
How about this one?
>>> from os import listdir
>>> set(listdir(dirA)).difference(listdir(dirB))
os.listdir
get all the files under current directory, then use set.difference()
to get the difference in dirA but not in dirB.
Upvotes: 1
Reputation: 1857
You can use set to get difference of files in this way.
import os
from os.path import isfile
list_A = [x for x in os.listdir(dirA) if isfile(x)]
list_B = [x for x in os.listdir(dirB) if isfile(x)]
diff = set(list_A) - set(list_B)
Upvotes: 4
Reputation: 4047
In the last for loop, you wrote fpa.keys()
instead of fpA.keys()
and fpb.keys()
instead of fpB.keys()
. Use the appropriate variable names, and it will work. It is working for me.
Upvotes: 1