C.Edelweiss
C.Edelweiss

Reputation: 161

Script to move all files starting with the same 7 letters in a different folder named after first 7 chars of its future content

All files are in a directory (over 500 000 files), named in the following pattern

AR00001_1
AR00001_2
AR00001_3
AR00002_1
AR00002_2
AR00002_3

I need a script, can be both batch or unix shell that takes everything with AR00001 and moves it into a new folder that will be called AR00001, and does the same for AR00002 files etc

Here's what I've been trying to figure out until now

for f in *_*; do
   DIR="$( echo ${f%.*} | tr '_' '/')"
   mkdir -p "./$DIR"
   mv "$f" "$DIR"
done

Thanks

// Update

Ran this in the CMD

for %F in (c:\test\*) do (md "d:\destination\%~nF"&move "%F" "d:\destination\%~nF\") >nul

Seems to be almost what I wanted, except that it does not take the first 7 characters as a substring but instead creates a folder for each file :/ I'm trying to mix it with your solutions

Upvotes: 0

Views: 2627

Answers (3)

Stephan
Stephan

Reputation: 56208

@echo off 
setlocal enabledelayedexpansion
for %%a in (???????_*) do (
  set "x=%%a"
  set "x=!x:~0,7!"
  md "!x!" >nul
  move "!x!*" "!x!\" 2>nul
)

for every matching file do:
- get the first 7 characters
- create a folder with that name (ignore error message, if exist)
- move all files that start with those 7 characters (ignore errormessages, if files doesn't exist (already moved))

Upvotes: 3

wardies
wardies

Reputation: 1259

The following achieves the desired effect and checks for non-existence of the target directory each time before creating it.

@echo off
setlocal ENABLEDELAYEDEXPANSION
set "TOBASE=c:\target\"
set "MATCHFILESPEC=AR*"
for %%F in ("%MATCHFILESPEC%") do (
    set "FILENAME=%%~nF"
    set "TOFOLDER=%TOBASE%!FILENAME:~0,7!"
    if not exist "!TOFOLDER!\" md "!TOFOLDER!"
    move "%%F" "!TOFOLDER!" >nul
)
endlocal

In the move command, by moving only the current file rather than including a wildcard, we ensure that we're not eating up file names that might be about to appear the next time around the loop. Keeping it simple, assuming that efficiency is not of prime importance.

I'd recommend prototyping by creating batch files (with a .bat or .cmd extension) rather than trying to do complex tasks interactively using on one-liners. The behaviour can be different and there are more things you can do in a batch file, such as using setlocal to turn on delayed expansion of variables. It's also just a pain writing for loops using the %F interactively, only to have to remember to convert all those to %%F, %%~nF, etc. when pasting into a batch file for posterity.

One word of caution: with 500,000 files in the folder, and all of the files having very similar prefixes, if your file system has 8.3 directory naming turned on (which is often the default) it is possible to run into problems using wildcards. This happens as the 8.3 namespace gets more and more busy and there are fewer and fewer options for ways the file name can be encoded in 8 characters. (The hash table fills up and starts overflowing into unexpected file names).

One solution is to turn that feature off on the server but that may have severe implications for any legacy applications. To see what the file looks like in 8.3 naming scheme, you can do, e.g.:

dir /x /p AR*

... which might give you something like (where the left hand name is the one converted to 8.3):

ARB900~1.TST AR15467_RW322.tst
AR85E3~1.TST AR15468_RW322.tst
ARDDFE~1.TST AR15469_RW322.tst
AR1547~1.TST AR15470_RW322.tst
AR1547~2.TST AR15471_RW322.tst
...

In this example, since the first two characters seem to be maintained, there should be no conflict.

So for example if I say for %a in (AR8*) do @echo %a I get what might at first seem to be incorrect:

AR15468_RW322.tst
AR18565_RW322.tst
AR20376_RW322.tst
AR14569_RW322.tst
AR17278_RW322.tst
...

But this is actually correct; it is all the files that match AR8* in both the long file name and short file name formats.

Edit: I am aware in retrospect that this solution looks very similar to Stephan's, and I had browsed through the existing answers before starting work on my own, so I should credit him. I will try and save face by pointing out a benefit of Stephan's solution. Its use of wildcards should circumvent any 8.3 naming issue: by specifying the wildcard as ???????_*, it only catches the long file names and won't match any of the converted 8.3 file names (all of which are devoid of underscores in that position). Similarly, a wildcard such as AR?????_* would do the same.

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 247012

With bash, you'd write:

for f in *; do 
    [[ -d $f ]] && continue   # skip existing directories
    prefix=${f:0:7}           # substring of first 7 characters
    mkdir -p "$prefix"        # create the directory if it does not exist
    mv "$f" "$prefix"         # and move the file
done

For the substring expansion, see https://www.gnu.org/software/bash/manual/bash.html#Shell-Parameter-Expansion -- this is probably the bit you're missing.

Upvotes: 1

Related Questions