Jaanna
Jaanna

Reputation: 1670

copy and rename files based on number in previous file

Good day,

I have a destination directory with following files:

V1__baseline.sql
V2__inserts.sql
V3__packages.sql
...
V10_change_table.sql

Then I have source directory with following files:

v000_001_mk-tbl-dwa_ranking.sql
v000_002_mk-tbl-dwa_camp_week.sql
...
...
v000_179_crt_table_stat_flg.sql
v000-180_crt_table_ing_flg.sql
v000-181_crt_table_update_flg.sql

What I would like to do is copy all files now or in future after v000_179_crt_table_stat_flg.sql from source to destination and rename the files in destination directory sequentially. The destination directory should look like this:

V1__baseline.sql
V2__inserts.sql
V3__packages.sql
...
V10__change_table.sql
V11__crt_table_ing_flg.sql
V12__crt_table_update_flg.sql

In other words the format of file name in destination is V{number}__{name}.sql, whereas the format of file name in source is v000-{number}_{name}.sql

How can I do it? I assume I'll need a clever looping script with a command something like this:

cp "`ls -Art ${source_dir}/* | tail -n 1`"  ${destination_dir}/

Upvotes: 0

Views: 197

Answers (2)

csknk
csknk

Reputation: 2039

Because the question specifies renaming and copying rather than renaming and moving files, the solution must presumably make sure that files from the source directory are not duplicated in the destination. This complicates the solution.

The script can't simply check for the existence of the source file in the destination, because it was renamed as part of the move. Running cmp or diff is probably wasteful of resources, especially if the files to be compared are large database dumps (hinted at by the .sql extension).

In the solution below I've added a manifest file to track which files have been copied, but if I were building this for myself I wouldn't be comfortable with this approach. If the manifest file was accidentally deleted or edited, the script would lose track of which files have already been copied and on the next run all files would be copied. The sequential indexing of filenames in the destination directory would be thrown off. If possible, I think it would be better to either:

  • Rename the source files to reflect their copied status, thereafter excluding these files from the copy operation
  • Rename & move the files from source directory to the destination

Note that when making numerical comparisons, bash sees numbers with leading zeros as octal. You could remove leading zeros when extracting the number for comparison, but I used $((10#$foo)) in the test conditions to specify decimal numbers. I think this has messed up Stack Overflow's syntax highlighting - which wrongly treats text after the # in 10# as a comment.

#!/bin/bash

# Set source and destination paths
readonly SRC=src
readonly DEST=dest
readonly COPY_MANIFEST="${SRC}"/copied.txt

# $COPY_MANIFEST will keep track of which files have been copied
[[ -f "$COPY_MANIFEST" ]] || touch "$COPY_MANIFEST"

# Get the highest index in destination directory from the file numeric prefix
highest=0
for file in $DEST/*; do
    base=$(basename ${file})
    index=$(echo $base | sed 's/[^0-9]//g')
    # Compare numbers. Convert to decimal format because leading zeros denote octal numbers 
    [[ $((10#$highest)) -le $((10#$index)) ]] && highest=$index
done

# Rename and copy files from source to destination
for original in ${SRC}/*; do
    previously_copied=false

    # Don't process the manifest file
    [[ ${original} = $COPY_MANIFEST ]] && continue

    # If the source directory is empty, exit early
    [[ -f "$original" ]] || { echo "No source files in ${SRC}"; exit;}

    # Check the file has not already been copied - uses a manifest file rather 
    # than using tools like cmp or diff to check for duplicate files.
    while read line; do
        if [[ "${original}" = "${line}" ]]; then
            echo "${original} has already been renamed and copied."
            previously_copied=y
        fi
    done < "$COPY_MANIFEST"
    [[ $previously_copied = y ]] && continue

    # Get the base name of the file
    name=$(basename ${original})

    # Original question asks that all files greater than v000_179_crt_table_stat_flg.sql are copied.
    # If this requirement is not needed, the next 2 lines can be removed
    num=$(echo "$name" | sed 's/V[0-9]*_\([0-9]*\).*/\1/g')
    [[ $((10#$num)) -le $((179)) ]] && { echo "Not eligible"; continue; }

    # Build the new filename and copy
    # Get rid of the prefix, leaving the descriptive name
    name=${name#V[0-9]*_[0-9]*_}
    highest=$(( 10#$highest + 1 ))
    new_name=V${highest}__${name}
    cp ${original} ${DEST}/${new_name}

    # Update the manifest to prevent repeat copying
    echo ${original} >> $COPY_MANIFEST
done

Upvotes: 1

Paul Hodges
Paul Hodges

Reputation: 15388

Rough version -

targetDir=. # adjust as needed
declare -i ctr=1
declare -a found=()
declare -l file
for file in [Vv][0]*            # refine this to get the files you want
do x=${file#v}                  # knock off the leading v
   while [[ "$x" =~ ^[0-9_-] ]] # if leading digits/dashes/underscores
   do x=${x:1}                  # strip them
   done
   found=( V${ctr}__* )         # check for existing enumerator
   while [[ -e "${found[0]}" ]] # if found
   do (( ctr++ ))               # increment
      found=( V${ctr}__* )      # and check again
   done
   mv "$file" "$targetDir/V${ctr}__$x" # move the file
done

Please read over, ask questions, and edit to suit your specific needs.

Upvotes: 1

Related Questions