Reputation: 1
I am trying to find some missing files, but those files are in a pair.
as example, we have files like: file1_LEFT file1_RIGHT file2_LEFT file2_RIGHT file3_LEFT file4_RIGHT ...
The ideea is the name is same but they have a left\right pair. Normally we have thousands of files but somewhere there, we'll find some files without a pair. Like file99_LEFT is present but RIGHT is missing (or vice-versa for sides).
I'm trying to make a script in python 2.7 (yes i'm using an old python for personal reasons... unfortunately) but i have no clue how can be realized. ideas tried: -verify them 2 by 2 and check if we have RIGHT in current file and LEFT in previous, print ok, else print the file that's not matching. But after first one is printed, all others are failing due to fact that the structure is changed, at that point we won't have left-right one next to eachother, their order will be re-arranged -create separate lists for LEFT and RIGHT and compare them but again first one will be found but won't work for others.
Code i've used until now:
import os
import fnmatch,re
path = raw_input('Enter files path:')
for path, dirname, filenames in os.walk(path):
for fis in filenames:
print fis
print len(filenames)
for i in range(1,len(filenames),2):
print filenames[i]
if "RIGHT" in filenames[i] and "LEFT" in filenames[i-1]:
print "Ok"
else:
print "file >"+fis+"< has no pair"
f = open(r"D:\rec.txt", "a")
f.writelines(fis + "\n")
f.close()
Thanks for your time!
Upvotes: 0
Views: 357
Reputation: 2797
We can use glob to list the files in a given path, filtered by a search pattern.
If we consider one set of all LEFT filenames, and another set of all RIGHT filenames, can we say you are looking for the elements not in the intersection of these two sets?
That is called the "symmetric difference" of those two sets.
import glob
# Get a list of all _LEFT filenames (excluding the _LEFT part of the name)
# Eg: ['file1', 'file2' ... ].
# Ditto for the _RIGHT filenames
# Note: glob.glob() will look in the current directory where this script is running.
left_list = [x.replace('_LEFT', '') for x in glob.glob('*_LEFT')]
right_list = [x.replace('_RIGHT', '') for x in glob.glob('*_RIGHT')]
# Print the symmetric difference between the two lists
symmetric_difference = list(set(left_list) ^ set(right_list))
print symmetric_difference
# If you'd like to save the names of missing pairs to file
with open('rec.txt', 'w') as f:
for pairname in symmetric_difference:
print >> f, pairname
# If you'd like to print which file (LEFT or RIGHT) is missing a pair
for filename in symmetric_difference:
if filename in left_list:
print "file >" + filename + "_LEFT< has no pair"
if filename in right_list:
print "file >" + filename + "_RIGHT< has no pair"
Upvotes: 2