Mohamad Ibrahim
Mohamad Ibrahim

Reputation: 5575

Sorting BIG file based on two columns

I have a big file that can't fit in the memory which I would like to sort. The file consists of two columns and multiple records. The first column is numerical type of ~10 digit and the second column of is a string that contains any printing character(to be sorted according to ASCII code).

I need to sort record based on numerical field but in case two records have the same numerical field I would like to sort based on the string i.e. the second column.

For that I am trying to sort using Linux sort which employs external sorting but the problem it does not sort strings based on the ASCII code. Any idea?!!

Upvotes: 1

Views: 468

Answers (1)

sagi
sagi

Reputation: 5767

The GNU sort utility sorts according to the current locale. See the comment from the manpage:

   *** WARNING *** The locale specified by the environment affects sort order.  Set LC_ALL=C to get the traditional sort order that uses native byte values.

You can use the -n flag to do a numeric sort, or the -k to use by a combination of both numeric and non-numeric sorts. For example:

sort -k1 -k2n

Will do a textual sort according to the first column, then a numeric sort according to the second column.

Upvotes: 4

Related Questions