rarivero
rarivero

Reputation: 15

How can I sort filenames within multiple directories into one sequential and numerically ascending array/list?

Let's say I have three directories that each have different amounts of files within them (though in this simplified case, it's four):

BA-2016-05:

AG-1829A.jpg
AG-1829B.jpg
AG-1829C.jpg
AG-1830A.jpg

BA-2016-V01:

AG-1712A.jpg
AG-1712B.jpg
AG-1922A.jpg
AG-1922B.jpg

BA-2017-PD02:

AG-1100A.jpg
AG-1100B.jpg
AG-1100C.jpg
AG-1100D.jpg

I want the resulting array to look something like this:

AG-1100A.jpg AG-1100B.jpg AG-1100C.jpg AG-1100D.jpg
AG-1712A.jpg AG-1712B.jpg
AG-1829A.jpg AG-1829B.jpg AG-1829C.jpg
AG-1830A.jpg
AG-1922A.jpg AG-1922B.jpg

The array will be saved to a .txt document and can be space or tab delimited.

I've so far slightly adapted a response from elsewhere online to list all the sorted files by filename in ascending order:

find ~/BA* -iname "*.jpg" |\
awk -vFS=/ -vOFS=/ '{ print $NF,$0 }' |\
sort -n -t / |\
cut -f2- -d/

It should be easy enough to cut off the beginning of the path using filename="${fullpath##*/}", but after that is where I'm stuck. How do I turn this list into an array that's formatted as mentioned above?

A few notes:

  1. The format of the filenames will always be AG-[numbers][A-D] or, to make it more generic, [letters][hyphen][numbers][A-D].
  2. The extensions will always be .jpg or .JPG, but bonus points for one that works with all extensions and preserves them in the output array.

EDIT: I include the final solution I'm using below. It includes a mix of things from both answers I got, plus some gimmicky awk stuff before the output is made to change spaces for tabs. Works like a charm. I also realized I actually needed to include a URL that would be completed by incorporating the filename/path into it. But I was able to figure that out pretty quickly. Anyway, thanks to all for your help and here's the final code:

#!/bin/bash

# The number of the current line
current_nb=;

# Variable to store the current line before writing it
line=;

# Loop through all regular files of the directories and subdirectories specified
# Sort all file paths in ascending order (irrespective of the directory name)
for file in $(find ./BA* -iname "*.jpg" -printf '%f/%p\n' | sort -n -t / | cut -f2- -d/); 
do 

    # Append image URL to each file in the loop
    file_url=`sed 's/^.*\/\(.*\/.*\)/[INSERT URL HERE]/\1/' <<< "$file"`;

    # Extract the number from the current file in the loop
    nb=`sed 's/.*-\([0-9]\+\)[[:alpha:]].*/\1/' <<< "$file"`; 

    # For the first loop, when $current_nb is not initialized, we set $nb as the default value
    current_nb=${current_nb:-$nb}; 

    # If we stay on the same line...
    if [ "$nb" -eq "$current_nb" ]; 
        then 
        # ...then concatenate the new filename with the line currently being created
        line="$line $file_url"; 

        else 
        # Otherwise, append the line at the end of the output file (changing spaces to tabs)...
        echo $line | awk -v OFS="\t" '$1=$1' >> url_list.txt; 

        # ...and prepare a new line
        line="$file_url ";
        current_nb=$nb; 
    fi; 

done;

Upvotes: 1

Views: 243

Answers (2)

Wrotcod
Wrotcod

Reputation: 50

This is more generic and works for all extensions. In addition, I do not create any array, but write the result directly into the output file.

#!/bin/bash
# The number of the current line
current_nb=;
# Variable to store the current line before writing it
line=;
# Loop through all regular files of this directory and its subdirs sorted
# We extract the basename (e.g. AG-1829A.jpg )
for file in $(find . -type f -exec basename {} \; | sort -n); do 
    # Extract its number
    nb=`sed 's/.*-\([0-9]\+\)[[:alpha:]].*/\1/' <<<"$file"`; 
    # For the first loop, when current_nb is not initialized, we set $nb as default value
    current_nb=${current_nb:-$nb}; 
    # If we stay on the same line
    if [ "$nb" -eq "$current_nb" ]; then 
        # Concatenate the new filename
        line="$line $file"; 
    else 
        # Else append the line at the end of file
        echo $line >> out.txt; 
        # And prepare the new one
        line="$file ";
        current_nb=$nb; 
    fi; 
done;

Upvotes: 0

John1024
John1024

Reputation: 113834

The sorted list

To generate the list that you want:

$ find ./BA* -iname "*.jpg" -printf '%f\n' | sort -n
AG-1100A.jpg
AG-1100B.jpg
AG-1100C.jpg
AG-1100D.jpg
AG-1712A.jpg
AG-1712B.jpg
AG-1829A.jpg
AG-1829B.jpg
AG-1829C.jpg
AG-1830A.jpg
AG-1922A.jpg
AG-1922B.jpg

Find's printf feature allows customized output. Since you only want file names with directories, we use the %f format option to -printf.

Create array (naive version)

If the file names are guaranteed not to contain whitespace or any other shell-active characters, then the following works:

arr=($(find ./BA* -iname "*.jpg" -printf '%f\n' | sort -n))

We can verify that array arr contains the what you want via:

$ declare -p arr
declare -a arr=([0]="AG-1100A.jpg" [1]="AG-1100B.jpg" [2]="AG-1100C.jpg" [3]="AG-1100D.jpg" [4]="AG-1712A.jpg" [5]="AG-1712B.jpg" [6]="AG-1829A.jpg" [7]="AG-1829B.jpg" [8]="AG-1829C.jpg" [9]="AG-1830A.jpg" [10]="AG-1922A.jpg" [11]="AG-1922B.jpg")

Create array (robust version)

To handle the most general file names:

array=()                                                                            
while IFS= read -r -d $'\0'; do                                                     
   array+=("$REPLY")                                                               
done < <(find ./BA* -iname "*.jpg" -printf '%f\0' | sort -zn)  

To verify the result:

$ declare -p array
declare -a array=([0]="AG-1100A.jpg" [1]="AG-1100B.jpg" [2]="AG-1100C.jpg" [3]="AG-1100D.jpg" [4]="AG-1712A.jpg" [5]="AG-1712B.jpg" [6]="AG-1829A.jpg" [7]="AG-1829B.jpg" [8]="AG-1829C.jpg" [9]="AG-1830A.jpg" [10]="AG-1922A.jpg" [11]="AG-1922B.jpg")

The robust version separates the file names with NUL characters. A full explanation of how this works can be found here.

Upvotes: 2

Related Questions