Hays Lo
Hays Lo

Reputation: 85

How to sort filenames that are numbered with various conventions

This is similar to my previous question, but I think it's more different than it is similar.

I have a folder in Bash. In that folder, are numbered files, all of the same filetype. Those files have filenames that cannot contain spaces. I need to sort those files by number. Say, if the folder contained:

1.jpg
1.2.jpg
ch.002.jpg
Chapter3.jpg
Chapter_004:_Blah.jpg
Chapter_4.1.jpg
Chapter_5.jpg
Chapter_5.005.jpg

The resulting string would be

"1.jpg 1.2.jpg 002.jpg Chapter3.jpg Chapter 004:_Blah.jpg Chapter_4.1.jpg Chapter_5.jpg Chapter_5.005.jpg"

As you can see, I need support for floats, numbers with leading zeros, and regular numbers.

Upvotes: 2

Views: 86

Answers (2)

dawg
dawg

Reputation: 104092

Since you have stated Those files have filenames that cannot contain spaces. you can use a fairly simple Bash pipeline to generate a DSU (Decorate Sort Undecorate or Schwartzian transform) using * and gawk and a numerical sort. (If the files could contain spaces, the method is the same but we would need to use a Bash loop instead of the tr ' ' '\n' to delimit each file name.)

Given these files:

$ echo *
1.2.jpg 1.jpg Chapter3.jpg Chapter_004:_Blah.jpg Chapter_4.1.jpg Chapter_5.005.jpg Chapter_5.jpg ch.002.jpg

You can do:

$ echo * | tr ' ' '\n' | gawk '{match($0, /([0-9]+\.{0,1}[0-9]*)/, arr); print arr[1] "/" $0}' | sort -n | awk -F"/" '{print $NF}' | tr '\n' ' '
1.jpg 1.2.jpg ch.002.jpg Chapter3.jpg Chapter_004:_Blah.jpg Chapter_4.1.jpg Chapter_5.jpg Chapter_5.005.jpg 

Whatever conditions you want to add to the decoration portion, you would add by changing the regex in match($0, /([0-9]+\.{0,1}[0-9]*)/, arr) to capture that portion from the file name.

In Unix, the character / is an illegal filename character and therefor an effective delimiter between the decoration and the filename. We then sort numerically on the float interpretation of the decoration and remove the decoration with a second awk.

If you wanted to add multiple conditions to sort you can add multiple decorations with the first awk with an illegal filename character as the delimiter. Then appropriately use the multi field arguments to sort and undecorate with the last awk command to print the filename.

Upvotes: 1

Cyrus
Cyrus

Reputation: 88949

A Schwartzian transform with paste, GNU sort and bash:

paste <(tr -cd '[0-9.\n]' < file | sort -V) file | awk '{print $2}' | tr '\n' ' '

Output:

1.jpg 1.2.jpg 002.jpg Chapter3.jpg Chapter_004:_Blah.jpg Chapter_4.1.jpg Chapter_5.jpg Chapter_5.005.jpg 

Upvotes: 3

Related Questions