bnoeafk
bnoeafk

Reputation: 539

Linux sort showing odd behavior

I have a script that needs to CAT a number of files, in numerical order. Whilst it seems to work fine with a couple of hundred files, I am now experiencing some "interesting" results in handling a larger file.

The file in question has been split into 1289 individual files, named ABC.001-1289 to ABC.1289-1289

I'm using "ls -gGo ABC* | sort -hk9" to list the files in, what I would deem to be, a human readable sort order. All goes swimmingly until I hit ABC.763-1289:

ABC.001-1289 .. ABC.763-1289
ABC.1000-1289 .. ABC.1040-1289 
ABC.764-1289 .. ABC.999-1289
ABC.1041-1289 .. ABC.1289-1289

I'm thinking some sort of buffer overrun or something, but I've not experienced something like this before and am kinda scratching my head into where I would even start looking to remedy the issue.

I've tried altering the "k" value and even removing it, with little positive outcome.

The more I look into this the more I believe a KEYDEF is required, but I can't ascertain the correct format to use this....

Any thoughts?

Upvotes: 2

Views: 124

Answers (2)

Mathias
Mathias

Reputation: 1500

A little hacky but try this:

 ls -gGo ABC* |cut -d "." -f 2 |sort -h

or

ls -gGo ABC* |cut -b 5- |sort -h

Upvotes: 1

Amnon Harel
Amnon Harel

Reputation: 103

I wouldn't want to start debugging the sort function built into the shell. So why not just use a different sort, outside the shell? For example, I'd use python:

#!/usr/bin/python2.7
import argparse, sys, re

parser = argparse.ArgumentParser( description='concatenate the input files by order',
                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter )
parser.add_argument( 'input', nargs='+', help='the paths to the files to be concatenated' )
parser.add_argument( '-n','--nosort', action='store_true', help='use the given order instead of sorting' )
parser.add_argument( '-o','--output', default='', help='output file. Will output to stdout if empty' )
args = parser.parse_args()

def human_keys( astr ):
    """
    alist.sort(key=human_keys) sorts in human order
    From unutbu @ http://stackoverflow.com/questions/5254021
    """
    keys=[]
    for elt in re.split( '(\d+)', astr ):
        elt = elt.swapcase()
        try: 
            elt = int(elt)
        except ValueError: 
            pass
        keys.append( elt )
    return keys

if not args.nosort:
    args.input.sort( key = human_keys )

output = args.output and open( args.output, 'w' ) or sys.stdout

for path in args.input:
    with open( path, 'r' ) as in_file:
        for line in in_file:
            output.write(line)

if output != sys.stdout:
    output.close() # not really needed. But tidier. Can put it in an "atexit", but that's an overkill.

Upvotes: 1

Related Questions