camh
camh

Reputation: 42448

Sorting on the last field of a line

What is the simplest way to sort a list of lines, sorting on the last field of each line? Each line may have a variable number of fields.

Something like

sort -k -1

is what I want, but sort(1) does not take negative numbers to select fields from the end instead of the start.

I'd also like to be able to choose the field delimiter too.

Edit: To add some specificity to the question: The list I want to sort is a list of pathnames. The pathnames may be of arbitrary depth hence the variable number of fields. I want to sort on the filename component.

This additional information may change how one manipulates the line to extract the last field (basename(1) may be used), but does not change sorting requirements.

e.g.

/a/b/c/10-foo
/a/b/c/20-bar
/a/b/c/50-baz
/a/d/30-bob
/a/e/f/g/h/01-do-this-first
/a/e/f/g/h/99-local

I want this list sorted on the filenames, which all start with numbers indicating the order the files should be read.

I've added my answer below which is how I am currently doing it. I had hoped there was a simpler way - maybe a different sort utility - perhaps without needing to manipulate the data.

Upvotes: 54

Views: 40699

Answers (11)

dardo82
dardo82

Reputation: 77

| sed "s#(.*)/#\1"\\$'\x7F'\# \
| sort -t\\$'\x7F' -k2,2 \
| sed s\#\\$'\x7F'"#/#"

Still way worse than simple negative field indexes for sort(1) but using the DEL character as delimiter shouldn’t cause any problem in this case.

I also like how symmetrical it is.

Upvotes: 0

Pykler
Pykler

Reputation: 14835

Here is a python oneliner version, note that it assumes the field is integer, you can change that as needed.

echo file.txt | python3 -c 'import sys; list(map(sys.stdout.write, sorted(sys.stdin, key=lambda x: int(x.rsplit(" ", 1)[-1]))))'

Upvotes: 0

commonpike
commonpike

Reputation: 11185

I want this list sorted on the filenames, which all start with numbers indicating the order the files should be read.

find . | sed 's#.*/##' | sort

the sed replaces all parts of the list of results that ends in slashes. the filenames are whats left, and you sort on that.

Upvotes: 1

François Rousseau
François Rousseau

Reputation: 381

awk '{print $NF,$0}' file | sort | cut -f2- -d' '

Basically, this command does:

  1. Repeat the last field at the beginning, separated with a whitespace (default OFS)
  2. Sort, resolve the duplicated filenames using the full path ($0) for sorting
  3. Cut the repeated first field, f2- means from the second field to the last

Upvotes: 38

Gabe
Gabe

Reputation: 86718

Here's a Perl command line (note that your shell may require you to escape the $s):

perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} <>"

Just pipe the list into it or, if the list is in a file, put the filename at the end of the command line.

Note that this script does not actually change the data, so you don't have to be careful about what delimeter you use.

Here's sample output:

>perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} " files.txt
/a/e/f/g/h/01-do-this-first
/a/b/c/10-foo
/a/b/c/20-bar
/a/d/30-bob
/a/b/c/50-baz
/a/e/f/g/h/99-local

Upvotes: 14

camh
camh

Reputation: 42448

Replace the last delimiter on the line with another delimiter that does not otherwise appear in the list, sort on the second field using that other delimiter as the sort(1) delimiter, and then revert the delimiter change.

delim=/
new_delim=" "
cat $list \
| sed "s|\(.*\)$delim|\1$new_delim|" \
| sort -t"$new_delim" -k 2,2 \
| sed "s|$new_delim|$delim|"

The problem is knowing what delimiter to use that does not appear in the list. You can make multiple passes over the list and then grep for a succession of potential delimiters, but it's all rather nasty - particularly when the concept of "sort on the last field of a line" is so simply expressed, yet the solution is not.

Edit: One safe delimiter to use for $new_delim is NUL since that cannot appear in filenames, but I don't know how to put a NUL character into a bourne/POSIX shell script (not bash) and whether sort and sed will properly handle it.

Upvotes: 2

sarnold
sarnold

Reputation: 104050

#!/usr/bin/ruby

f = ARGF.read
lines = f.lines

broken = lines.map {|l| l.split(/:/) }

sorted = broken.sort {|a, b|
    a[-1] <=> b[-1]
}

fixed = sorted.map {|s| s.join(":") }

puts fixed

If all the answers involve perl or awk, might as well solve the whole thing in the scripting language. (Incidentally, I tried in Perl first and quickly remembered that I dislike Perl's lists-of-lists. I'd love to see a Perl guru's version.)

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 342333

something like this

awk '{print $NF"|"$0}' file | sort -t"|" -k1 | awk -F"|" '{print $NF }'

Upvotes: 9

Thevs
Thevs

Reputation: 3253

I think the only solution would be to use awk:

  1. Put the last field to the front using awk.
  2. Sort lines.
  3. Put the first field to the end again.

Upvotes: 2

integer
integer

Reputation: 1075

A one-liner in perl for reversing the order of the fields in a line:

perl -lne 'print join " ", reverse split / /'

You could use it once, pipe the output to sort, then pipe it back and you'd achieve what you want. You can change / / to / +/ so it squeezes spaces. And you're of course free to use whatever regular expression you want to split the lines.

Upvotes: 3

Diego Sevilla
Diego Sevilla

Reputation: 29011

sort allows you to specify the delimiter with the -t option, if I remember it well. To compute the last field, you can do something like counting the number of delimiters in a line and sum one. For instance something like this (assuming the ":" delimiter):

d=`head -1 FILE | tr -cd :  | wc -c`
d=`expr $d + 1`

($d now contains the last field index).

Upvotes: -1

Related Questions