lzap
lzap

Reputation: 17174

How to copy files flattening into one dir using find and xargs (GNU Bash)

I need to find all files (e.g. with extension ABC) and copy it into one directory but creating unique filenames not to overwrite any files with the potential same name.

Something like this:

find /tmp -name \*.ABC | xargs cp '{}' somedir/$(echo {} | md5sum | cut -c1-6){} \;

Creating files like:

b786af1_original_name.ABC
a7af335_original_name_2.ABC
...

The command above obviously cannot work because $( ... ) statement is getting evaluated once. I need to evaluate it for every file name.

How to do that?

Upvotes: 1

Views: 1300

Answers (5)

Marcin
Marcin

Reputation: 3524

Important sidenote: watch out how big of a part of a hash you're using as an identifier.

If you're using 6 values of 0..F (16 values), that's less than 17 million combinations. So if you have 5000 files that you're identifying with these, you got a 52% chance of having a collision. 7 hex chars yields 4.5%, and 8 hex chars yields 0.3% of collision (for the 5000 files).

Upvotes: 0

Idelic
Idelic

Reputation: 15582

Use mktemp to generate unique names:

find /tmp -name \*.ABC | while read f; do 
  cp "$f" "$(mktemp /destination/dir/XXXXXXXXXX.ABC)"
done

Upvotes: 0

Gordon Davisson
Gordon Davisson

Reputation: 125838

For the record, here's a weird-filename-proofed version of @Ken's answer:

find /tmp -name \*.ABC -print0 | while IFS= read -r -d $'\0' i; do cp "$i" "$(basename "$i" | md5sum | cut -c1-6)$(basename "$i")"; done

See BashFAQ #20 for details, variants, etc.

Upvotes: 1

Ken
Ken

Reputation: 78852

Why not read?

find /tmp -name \*.ABC | while read i; do cp $i $(basename $i | md5sum | cut -c1-6)$(basename $i); done;

Upvotes: 2

olan
olan

Reputation: 3648

How about a random int based on the current nanosecond?

date +%N | sed -e 's/000$//' -e 's/^0//'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#  Strip off leading and trailing zeroes, if present.
#  Length of generated integer depends on
#  + how many zeroes stripped off.

Probability of getting the same file with the same name is very small with this method.

Source: http://tldp.org/LDP/abs/html/timedate.html

EDIT: actually this will just give you the same problem. Does it need to be a one liner?

Upvotes: 1

Related Questions