Reputation: 2510
I'm trying to do one written in the title, I explain with one example:
Tree directory: (A B C D H F G are my file)
dir0/
dir0/A //MD5sum equal MD5sum B
dir0/C
dir0/D // MD5sum equal MD5sum F G
dir0/dir1/B // MD5sum equal MD5sum A
dir0/dir1/H
dir0/dir1/dir2/G //MD5sum equal MD5sum F D
dir0/dir1/dir2/F //MD5sum equal MD5sum G D
with this command:
find dir0/ -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=prepend | awk '{ print $2 }'
I search all file in a dir0 and subdir, calculating the MD5sum, sort , select only files equals and divided into groups, print only path files.
Ok this works and I have this output:
dir0/A ]
dir0/dir1/B ] first group
dir0/D ]
dir0/dir1/dir2/F ]
dir0/dir1/dir2/G ] second group
how can I have an output in the following way?(each file with the same MD5sum in the same line, obviously without " first, second ... group")
dir0/A dir0/dir1/B ] first group
dir0/D dir0/dir1/dir2/F dir0/dir1/dir2/G ] second group
Upvotes: 0
Views: 90
Reputation: 204718
The shortest way to do this would be to add a pipeline step like this:
awk 'BEGIN{RS=RS RS}{$1=$1}1'
RS = RS RS
causes Awk to use "\n\n"
as its record separator, thus reading each block as a single record. The FS
field separator is whitespace, which includes newlines, so we don't have to do any work to split the lines.
$1 = $1
doesn't really change the value of $1
, but Awk thinks it could have, which means it'll reconstruct $0
(which currently has newlines in it) from $1
, $2
, etc., joining with OFS
(which is " "
by default).
1
causes Awk to print $0
(and ORS
, which is still a single newline) on every record.
Upvotes: 1