user1934428
user1934428

Reputation: 22311

Gnu sort: stray characters in field specification

sort doesn't seem to like my key specification. Why?

~/tmp $ sort --version
sort (GNU coreutils) 8.25
Packaged by Cygwin (8.25-1)
~/tmp $ echo 'a;b;c;d;e;f;g'|sort --field-separator=';' --key=1,5,2                                          
sort: stray character in field spec: invalid field specification '1,5,2'

From the man page:

-k, --key=KEYDEF : sort via a key; KEYDEF gives location and type

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.

Since the .C and OPTS part in the KEYDEF is optional, a key specification F,F,F (i.e. just the field numbers) should be correct. What did I do wrong?

BTW, my environment is Cygwin, running the Z-shell.

Upvotes: 5

Views: 2371

Answers (3)

alex
alex

Reputation: 955

As everything with logic, mentioning the to part of the --key=from,to has a meaning. But a subtle one.

3 1 3 4 2
2 2 2 3 4
1 1 1 5 0
2 0 0 3 4
2 1 4 3 4
2 1 6 3 4

would get sorted differently with -k2 than with -k2,2. On the one hand, mentioning the ending field is good for saving sort resources, so I'd use it in production. However omitting it may give more comparable results, so I'd use it for testing the same dataset.

Upvotes: 0

user1934428
user1934428

Reputation: 22311

Oops, I should have taken the man page more literally. The definition for KEYDEF says

F[.C][OPTS][,F[.C][OPTS]]

and not

F[.C][OPTS][,F[.C][OPTS]...]

which means that only 1 or 2 fields can be supplied, not an arbitrary number. This explains the error.

As a side note, I believe there is still an error in the man page. The KEYDEF definition says that the stop position defaults to the line's end. This can't be true, can it? IMO it should be the stop position defaults to the field's end.

UPDATE: My explanation is NOT correct. See the answer provided by @tedtoal for a correct explanation.

Upvotes: 1

tedtoal
tedtoal

Reputation: 1070

The two fields in -k arg are the START AND END fields. You can specify -k ANY NUMBER OF TIMES, to sort on multiple keys. So, -k 1,1 -k 2,2 -k 3,3 will sort first on field 1, then field 2 then field 3.

Upvotes: 9

Related Questions