user2278505
user2278505

Reputation: 57

Compare directories on file names

I need to compare the file names of two directories A and B.

A contains more files than B (around 15000/20000 respectively) with same name/different content.

I have:

dirA: 'doctor_Weiss.csv', 'doctor_Urlici.csv', 'doctor_Basler J. Rudolph.csv'

dirB: 'doctor_Weiss.csv', 'doctor_Urlici.csv'

I need all the files in dirA-dirB (from dirA):

diffAB: 'doctor_Basler J. Rudolph.csv'

I tried:

import os
from os.path import join

fpA = {}
for root, dirs, files in os.walk('C:\A\docs'):
    for name in files:
        fpA[name] = 1
fpB = {}
for root, dirs, files in os.walk('C:\B\docs'):
    for name in files:
        fpB[name] = 1

 a = []
 for name in fpA.keys():
     if not(name in fpB.keys()):
         a.append(name)

Didn't work. 'a' contains all the files from B and not just A-B.

I also tried to traverse both directories and create sets of files but did not work either (again all the files from B).

Thanks for your help

Upvotes: 2

Views: 184

Answers (4)

wolfrevo
wolfrevo

Reputation: 7303

You need to escape backslashes in path names! And - as suggested - rename fpa to fpA and fpb to fpB. Then your example will work.

import os
from os.path import join

fpA = {}
for root, dirs, files in os.walk('C:\\A\\docs'): # <- escape backslash
    for name in files:
        fpA[name] = 1
fpB = {}
for root, dirs, files in os.walk('C:\\B\\docs'): # <- escape backslash
    for name in files:
        fpB[name] = 1

a = []
for name in fpA.keys():
    if not(name in fpB.keys()):
        a.append(name)

Upvotes: 0

lord63. j
lord63. j

Reputation: 4670

How about this one?

>>> from os import listdir
>>> set(listdir(dirA)).difference(listdir(dirB))

os.listdir get all the files under current directory, then use set.difference() to get the difference in dirA but not in dirB.

Upvotes: 1

shaktimaan
shaktimaan

Reputation: 1857

You can use set to get difference of files in this way.

import os
from os.path import isfile
list_A = [x for x in os.listdir(dirA) if isfile(x)]
list_B = [x for x in os.listdir(dirB) if isfile(x)]
diff = set(list_A) - set(list_B)

Upvotes: 4

shruti1810
shruti1810

Reputation: 4047

In the last for loop, you wrote fpa.keys() instead of fpA.keys() and fpb.keys() instead of fpB.keys(). Use the appropriate variable names, and it will work. It is working for me.

Upvotes: 1

Related Questions