Reputation: 15
Let's say I have three directories that each have different amounts of files within them (though in this simplified case, it's four):
BA-2016-05:
AG-1829A.jpg
AG-1829B.jpg
AG-1829C.jpg
AG-1830A.jpg
BA-2016-V01:
AG-1712A.jpg
AG-1712B.jpg
AG-1922A.jpg
AG-1922B.jpg
BA-2017-PD02:
AG-1100A.jpg
AG-1100B.jpg
AG-1100C.jpg
AG-1100D.jpg
I want the resulting array to look something like this:
AG-1100A.jpg AG-1100B.jpg AG-1100C.jpg AG-1100D.jpg
AG-1712A.jpg AG-1712B.jpg
AG-1829A.jpg AG-1829B.jpg AG-1829C.jpg
AG-1830A.jpg
AG-1922A.jpg AG-1922B.jpg
The array will be saved to a .txt
document and can be space or tab delimited.
I've so far slightly adapted a response from elsewhere online to list all the sorted files by filename in ascending order:
find ~/BA* -iname "*.jpg" |\
awk -vFS=/ -vOFS=/ '{ print $NF,$0 }' |\
sort -n -t / |\
cut -f2- -d/
It should be easy enough to cut off the beginning of the path using filename="${fullpath##*/}"
, but after that is where I'm stuck. How do I turn this list into an array that's formatted as mentioned above?
A few notes:
AG-[numbers][A-D]
or, to make it more generic, [letters][hyphen][numbers][A-D]
..jpg
or .JPG
, but bonus points for one that works with all extensions and preserves them in the output array.EDIT: I include the final solution I'm using below. It includes a mix of things from both answers I got, plus some gimmicky awk
stuff before the output is made to change spaces for tabs. Works like a charm. I also realized I actually needed to include a URL that would be completed by incorporating the filename/path into it. But I was able to figure that out pretty quickly. Anyway, thanks to all for your help and here's the final code:
#!/bin/bash
# The number of the current line
current_nb=;
# Variable to store the current line before writing it
line=;
# Loop through all regular files of the directories and subdirectories specified
# Sort all file paths in ascending order (irrespective of the directory name)
for file in $(find ./BA* -iname "*.jpg" -printf '%f/%p\n' | sort -n -t / | cut -f2- -d/);
do
# Append image URL to each file in the loop
file_url=`sed 's/^.*\/\(.*\/.*\)/[INSERT URL HERE]/\1/' <<< "$file"`;
# Extract the number from the current file in the loop
nb=`sed 's/.*-\([0-9]\+\)[[:alpha:]].*/\1/' <<< "$file"`;
# For the first loop, when $current_nb is not initialized, we set $nb as the default value
current_nb=${current_nb:-$nb};
# If we stay on the same line...
if [ "$nb" -eq "$current_nb" ];
then
# ...then concatenate the new filename with the line currently being created
line="$line $file_url";
else
# Otherwise, append the line at the end of the output file (changing spaces to tabs)...
echo $line | awk -v OFS="\t" '$1=$1' >> url_list.txt;
# ...and prepare a new line
line="$file_url ";
current_nb=$nb;
fi;
done;
Upvotes: 1
Views: 243
Reputation: 50
This is more generic and works for all extensions. In addition, I do not create any array, but write the result directly into the output file.
#!/bin/bash
# The number of the current line
current_nb=;
# Variable to store the current line before writing it
line=;
# Loop through all regular files of this directory and its subdirs sorted
# We extract the basename (e.g. AG-1829A.jpg )
for file in $(find . -type f -exec basename {} \; | sort -n); do
# Extract its number
nb=`sed 's/.*-\([0-9]\+\)[[:alpha:]].*/\1/' <<<"$file"`;
# For the first loop, when current_nb is not initialized, we set $nb as default value
current_nb=${current_nb:-$nb};
# If we stay on the same line
if [ "$nb" -eq "$current_nb" ]; then
# Concatenate the new filename
line="$line $file";
else
# Else append the line at the end of file
echo $line >> out.txt;
# And prepare the new one
line="$file ";
current_nb=$nb;
fi;
done;
Upvotes: 0
Reputation: 113834
To generate the list that you want:
$ find ./BA* -iname "*.jpg" -printf '%f\n' | sort -n
AG-1100A.jpg
AG-1100B.jpg
AG-1100C.jpg
AG-1100D.jpg
AG-1712A.jpg
AG-1712B.jpg
AG-1829A.jpg
AG-1829B.jpg
AG-1829C.jpg
AG-1830A.jpg
AG-1922A.jpg
AG-1922B.jpg
Find's printf
feature allows customized output. Since you only want file names with directories, we use the %f
format option to -printf
.
If the file names are guaranteed not to contain whitespace or any other shell-active characters, then the following works:
arr=($(find ./BA* -iname "*.jpg" -printf '%f\n' | sort -n))
We can verify that array arr
contains the what you want via:
$ declare -p arr
declare -a arr=([0]="AG-1100A.jpg" [1]="AG-1100B.jpg" [2]="AG-1100C.jpg" [3]="AG-1100D.jpg" [4]="AG-1712A.jpg" [5]="AG-1712B.jpg" [6]="AG-1829A.jpg" [7]="AG-1829B.jpg" [8]="AG-1829C.jpg" [9]="AG-1830A.jpg" [10]="AG-1922A.jpg" [11]="AG-1922B.jpg")
To handle the most general file names:
array=()
while IFS= read -r -d $'\0'; do
array+=("$REPLY")
done < <(find ./BA* -iname "*.jpg" -printf '%f\0' | sort -zn)
To verify the result:
$ declare -p array
declare -a array=([0]="AG-1100A.jpg" [1]="AG-1100B.jpg" [2]="AG-1100C.jpg" [3]="AG-1100D.jpg" [4]="AG-1712A.jpg" [5]="AG-1712B.jpg" [6]="AG-1829A.jpg" [7]="AG-1829B.jpg" [8]="AG-1829C.jpg" [9]="AG-1830A.jpg" [10]="AG-1922A.jpg" [11]="AG-1922B.jpg")
The robust version separates the file names with NUL characters. A full explanation of how this works can be found here.
Upvotes: 2