Reputation: 612
i want to compare the content of two folders and delete duplicated data, actually i wrote a script (BASH) but i think it's not the right way to do it (i use loops to iterate over directories content and a lot of diff commands , that make it too much time consuming).
I'll explain the context :
I have two directories :
1-
dir1/
Student1/
homework1
homework2
Student2/
homework1
homework2
2-
dir2/
Student1/
homework1
homework2
Student3/
homework1
homework2
suppose that student1/homework1 folder contains the same data in dir1 and dir2, unlike homework2 which contains different data
the output directory should contains :
Student1
homework1 //same name , same content ==> keep one homework
homework2
homework2_dir2 //same name different content ==> _dir2
Student2
homework1
homework2
Student3
homework1
homework2
What do you think the optimal way in term of time and reliability (filenames problem, etc..) to do such kind of operation ?
Thank you ;)
PS: dir* and Student* and homework* are directories
PS2: PLEASE i am not looking to this model of answer :
loop over student
loop over student homeworks
test on homework existance
diff on homework content
if diff copy
end
end
if i have alot of student and alot of homeworks with only one difference (only one homework that differ), the script take alot of time with the above solution
Upvotes: 1
Views: 2047
Reputation: 7910
As far as I understand, you need to merge all files in two different directories into a new directory and you don't want duplicate files or folders.
Let's say you want to merge them into 'merged' directory.
You can do this:
rsync -hrv /dir1 /merged/
rsync -hrv /dir2 /merged/
All files in the /dir1 folder will be copied into /merged folder, then the same process will work for /dir2 folder.
Upvotes: 0
Reputation: 1868
Assuming that dir1 and dir2 are relative paths with no directories (i.e. no slashes in dir1 or dir2):
dir1=dir1
dir2=dir2
cd $dir1
BASEDIR=$(pwd)
for studentdir in *
cd $BASEDIR/$studentdir
do
for homeworkdir in *
cd $BASEDIR/$studentdir/$homeworkdir
do
for workfile in *
do
if cmp $workfile ${CMPDIR}/${studentdir}/${homeworkdir}/${workfile} 2>&1 >/dev/null
then
altdir=../${studentdir}_${dir2}
mkdir ../${altdir}
ln ${CMPDIR}/${studentdir}/${homeworkdir}/${workfile} ${altdir}
fi
done
done
done
I haven't tried this - there may be some typos.
In dir1, recurse into each student folder, and in each student folder into each homework directory.
In each homework directory, use cmp
on each file to check whether it is byte identical with the matching file in the dir2 subtree.
If different, create an alternate homework directory in the student directory, and link (ln
) the different file in to the alternate directory.
cmp
is faster than diff
; ln
is faster than cp
.
That's all, folks.
Upvotes: 1
Reputation: 242423
I'm not sure it's faster than your solution, as you didn't post it.
#!/bin/bash
mkdir output
cp -r dir1/* output
cd dir2
for student in Student* ; do
(
cd $student
out_path=../../output/$student
[[ -d $out_path ]] || mkdir $out_path
for file in * ; do
if [[ -f $out_path/$file ]] ; then
diff -q $file $out_path/$file \
|| cp $file $out_path/$file'_dir2'
else
cp $file $out_path/$student
fi
done
)
done
Upvotes: 0