GreenAsJade
GreenAsJade

Reputation: 14685

Why doesn't the unix sort process grow as it consumes its input?

I tried this (to observe the behaviour of unix sort):

yes | sort & top

What I see is the unix memory usage growing, as you would expect, but the sort process itself's memory does not appear to be growing:

Mem:   1689540k total,  1455384k used,   234156k free,   147248k buffers
Swap:  1718268k total,      804k used,  1717464k free,   956216k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
32248 mgregory  20   0 29844  25m  692 R 95.0  1.6   0:32.98 sort               
32247 mgregory  20   0  4036  504  444 S  4.0  0.0   0:01.52 yes             

The number 1455348 is growing rapidly

The number 29844 is not growing.

What is happening there?

Upvotes: 2

Views: 138

Answers (2)

Bellum
Bellum

Reputation: 175

Unix Sort uses an External R-Way merge sorting algorithm. It basically divides the input up into smaller portions of similar size (that fit into memory) and then merges each portion together at the end.

Those small portions of the file, except during its sorting precess, are stored in temporary disk files (usually in /tmp) and not in memory. Therefore the Unix Sort command's memory usage does not increase during the sorting process.

But why is the unix memory usage growing ? Simply because "unused memory is wasted memory". The Linux kernel keeps around huge amounts of file metadata and files that were requested, until something that looks more important pushes that data out.

Upvotes: 1

sehe
sehe

Reputation: 393547

Sort doesn't need to have all data in memory, necessarily.

  1. Sort is able to do merge sort if files are too big to fit in memory. I think (IIRC) some of this is described in the man/info pages. Edit e.g.:

    --batch-size=NMERGE
          merge at most NMERGE inputs at once; for more use temp files
    -S, --buffer-size=SIZE
          use SIZE for main memory buffer
    
  2. The 1455384k number is likely growing if

    • sort mmaps in more pages than are actually 'reserved' (i.e. locked into the process address space)

    • buffers are counted (as files and data are read, dentries, blocks and inodes are cached). Check this by doing (as root)

      echo 3 > /proc/sys/vm/drop_caches 
      

    and seeing how much memory becomes available again.

Upvotes: 3

Related Questions