mining
mining

Reputation: 3699

How do I sort file paths based on multiple embedded numbers?

I have run a program to generate results with different parameters, R, C and RP, reflected in the directory name of the output files, all named results.txt.

For instance, in directory name params_R_7_C_16_RP_0, the 7 is the value of parameter R, 16 is the value of parameter C and 0 is the value of parameter RP.

I want to get all results.txt files in the current directory tree, sorted by the embedded values of R,C and RP in their hosting directories.

I first use the following command to get the results.txt files that I want to parse:

find ./ -name "results.txt"

and the output is:

./params_R_11_C_9_RP_0/results.txt 
./params_R_7_C_9_RP_0/results.txt
./params_R_7_C_4_RP_0/results.txt
./params_R_11_C_16_RP_0/results.txt 
./params_R_9_C_4_RP_0/results.txt
./params_R_5_C_9_RP_0/results.txt 
./params_R_9_C_25_RP_0/results.txt 
./params_R_7_C_16_RP_0/results.txt 
./params_R_5_C_25_RP_0/results.txt 
./params_R_5_C_16_RP_0/results.txt 
./params_R_11_C_4_RP_0/results.txt
./params_R_9_C_16_RP_0/results.txt
./params_R_7_C_25_RP_0/results.txt
./params_R_11_C_25_RP_0/results.txt 
./params_R_5_C_4_RP_0/results.txt 
./params_R_9_C_9_RP_0/results.txt 

and I tried the following sort command:

find ./ -name "results.txt" | sort

which results in lexical sorting:

./params_R_11_C_16_RP_0/results.txt
./params_R_11_C_25_RP_0/results.txt
./params_R_11_C_4_RP_0/results.txt
./params_R_11_C_9_RP_0/results.txt
./params_R_5_C_16_RP_0/results.txt
./params_R_5_C_25_RP_0/results.txt
./params_R_5_C_4_RP_0/results.txt
./params_R_5_C_9_RP_0/results.txt
./params_R_7_C_16_RP_0/results.txt
./params_R_7_C_25_RP_0/results.txt
./params_R_7_C_4_RP_0/results.txt
./params_R_7_C_9_RP_0/results.txt
./params_R_9_C_16_RP_0/results.txt
./params_R_9_C_25_RP_0/results.txt
./params_R_9_C_4_RP_0/results.txt
./params_R_9_C_9_RP_0/results.txt

But what I actually want is selective numerical sorting: first by R value, then C, then RP:

./params_R_5_C_4_RP_0/results.txt
./params_R_5_C_9_RP_0/results.txt
./params_R_5_C_16_RP_0/results.txt
./params_R_5_C_25_RP_0/results.txt
./params_R_7_C_4_RP_0/results.txt
./params_R_7_C_9_RP_0/results.txt
./params_R_7_C_16_RP_0/results.txt
./params_R_7_C_25_RP_0/results.txt
./params_R_9_C_4_RP_0/results.txt
./params_R_9_C_9_RP_0/results.txt
./params_R_9_C_16_RP_0/results.txt
./params_R_9_C_25_RP_0/results.txt
...

I considered padding the embedded numbers (e.g., params_R_005_C_004_RP_0) when generating the paths list, but that would require an additional processing step, which I want to avoid.

Can the desired sorting be achieved directly?

Upvotes: 3

Views: 873

Answers (3)

mklement0
mklement0

Reputation: 437268

If you use GNU sort (a recent-enough version), @Fabricator's answer, based on GNU sort's -V option, is by far the simplest solution.

Otherwise, try this POSIX-compliant solution:

 find . -name 'results.txt' | sort -n -t _ -k3,3 -k5,5 -k 7,7
  • -n specifies numeric sorting
  • -t _ splits the input line into fields based on separator char. _
  • -k3,3 -k5,5 -k 7,7 sorts the input based first on field 3, then field 5, then field 7, corresponding to the R, C and RP values.
    (Note that using -k with a single number - e.g., -k3 - would instead result in sorting from field 3 through the remainder of the line).

Upvotes: 5

segfault
segfault

Reputation: 80

try find ./ -name "results.txt" | sort -k 3 -t _ -n -k 5 -n

Upvotes: 0

Fabricator
Fabricator

Reputation: 12772

You need the -V flag for sort

find ./ -name "results.txt" | sort -V

Upvotes: 6

Related Questions