mahmoudjs14
mahmoudjs14

Reputation: 3

how UNIX sort command handles expressions with different character sizes?

I am trying to sort and join two files which contain IP addresses, the first file only has IPs, the second file contains IPs and an associated number. But sort acts differently in these files. here are the code and outcomes:

cat file | grep '180.76.15.15' | sort
cat file | grep '180.76.15.15' | sort -k 1
cat file | grep '180.76.15.15' | sort -t ' ' -k 1

outcome:

180.76.15.150 987272
180.76.15.152 52219
180.76.15.154 52971
180.76.15.156 65472
180.76.15.158 35475
180.76.15.15 99709
cat file | grep '180.76.15.15' | cut -d ' ' -f 1 | sort

outcome:

180.76.15.15
180.76.15.150
180.76.15.152
180.76.15.154
180.76.15.156
180.76.15.158

As you can see, the first three commands all produce the same outcome, but when lines only contain IP address, the sorting changes which causes me a problem trying to join files.

Explicitly, the IP 180.76.15.15 appears at the bottom row in the first case (even when I sort explicitly on the first argument), but at the top row in the second case and I can't understand why.

Can anyone please explain why is this happening?

P.S. I am ssh connecting through windows 10 powershell to ubuntu 20.04 installed on VMware.

Upvotes: 0

Views: 94

Answers (1)

thanasisp
thanasisp

Reputation: 5975

sort will use your locale settings to determine the order of the characters. From man sort also:

*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

This way you can use the ASCII characters order. For example:

> cat file
#a
b#
152
153
15 4
15 1

Here all is sorted with the alphabetical order excluding special characters, first the numbers, then the letters.

thanasis@basis:~/Documents/development/temp> sort file
15 1
152
153
15 4
#a
b#

Here all characters count, first #, then numbers, but the space counts also, then letters.

thanasis@basis:~/Documents/development/temp> LC_ALL=C sort file
#a
15 1
15 4
152
153
b#

Upvotes: 1

Related Questions