Reputation: 3

Linux sort -Help Wanted

I'm stuck in a problem for few days. Here it is maybe u got bigger brains than me!

I got a bunch of CSV files and i want them concatenated into a single .csv file, numeric sorted. Ok, first encountered problem is with the ID (i want to sort unly by ID) name. eg

sort -f *.csv > output.csv This would work if i had standard ids like id001, id002, id010, id100 but my ids are like id1, id2, id10, id100 and this make my sort job inaccurate.

sort -t, -V *.csv > output.csv - This works perfectly on my test machine (sort --version GNU coreutils 8.5.0) but my live machine from work got 5.3.0 sort version (and they didn't had implemented -V syntax on it) and i cannot update it!

I'm feel so noob and unlucky If you have a better idea please bring it on.

my csv file looks like

cn41    AQ34070YTW  CDEAQ34070YTW   9C:B6:54:08:A3:C6   9C:B6:54:08:A3:C4        
cn42    AQ34070YTY  CDEAQ34070YTY   9C:B6:54:08:A4:22   9C:B6:54:08:A4:20        
cn43    AQ34070YV1  CDEAQ34070YV1   9C:B6:54:08:9F:0E   9C:B6:54:08:9F:0C        
cn44    AQ34070YV3  CDEAQ34070YV3   9C:B6:54:08:A3:7A   9C:B6:54:08:A3:78        
cn45    AQ34070YW7  CDEAQ34070YW7   9C:B6:54:08:25:22   9C:B6:54:08:25:20

This is actually copy / paste from a csv. So let's say, this is my first CSV. and the other one looks like

cn201   AQ34070YTW  CDEAQ34070YTW   9C:B6:54:08:A3:C6   9C:B6:54:08:A3:C4    
cn202   AQ34070YTY  CDEAQ34070YTY   9C:B6:54:08:A4:22   9C:B6:54:08:A4:20        
cn203   AQ34070YV1  CDEAQ34070YV1   9C:B6:54:08:9F:0E   9C:B6:54:08:9F:0C        
cn204   AQ34070YV3  CDEAQ34070YV3   9C:B6:54:08:A3:7A   9C:B6:54:08:A3:78        
cn205   AQ34070YW7  CDEAQ34070YW7   9C:B6:54:08:25:22   9C:B6:54:08:25:20

Looking forward reading you!

Regards

Upvotes: 0

Answers (3)

user4815162342

Reputation: 154886

If your CSV format is fixed, you can use the shell equivalent of the decorate-sort-undecorate pattern:

cat *.csv | sed 's/^,id//' | sort -n | sed 's/^/,id/' >output.csv

The -n option is present even in ancient version of sort.

UPDATE: the updated input contains a number with a different prefix, and at a different position in the line. Here is a version that handles both kinds of input, as well as other inputs that have a number somewhere in the line, sorting by the first number:

cat *.csv | sed 's/^\([^0-9]*\)\([0-9][0-9]*\)/\2 \1\2/' \
          | sort -n \
          | sed 's/^[^ ]* //' > output.csv

Upvotes: 1

fedorqui

Reputation: 289575

You can use the -kX.Y for column X starting on Y character, together with -n for numeric:

sort -t, -k2.3 -n *csv

Given your sample file, it produces:

$ sort -t, -k2.3 -n file
,id1,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id2,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id10,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id40,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id101,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id201,aaaaaaaaa,bbbbbbbbbb,ccccccccccc,ddddddd

Update

For your given input, I would do:

$ cat *csv | sort -k1.3 -n
cn41    AQ34070YTW  CDEAQ34070YTW   9C:B6:54:08:A3:C6   9C:B6:54:08:A3:C4        
cn42    AQ34070YTY  CDEAQ34070YTY   9C:B6:54:08:A4:22   9C:B6:54:08:A4:20        
cn43    AQ34070YV1  CDEAQ34070YV1   9C:B6:54:08:9F:0E   9C:B6:54:08:9F:0C        
cn44    AQ34070YV3  CDEAQ34070YV3   9C:B6:54:08:A3:7A   9C:B6:54:08:A3:78        
cn45    AQ34070YW7  CDEAQ34070YW7   9C:B6:54:08:25:22   9C:B6:54:08:25:20
cn201   AQ34070YTW  CDEAQ34070YTW   9C:B6:54:08:A3:C6   9C:B6:54:08:A3:C4    
cn202   AQ34070YTY  CDEAQ34070YTY   9C:B6:54:08:A4:22   9C:B6:54:08:A4:20        
cn203   AQ34070YV1  CDEAQ34070YV1   9C:B6:54:08:9F:0E   9C:B6:54:08:9F:0C        
cn204   AQ34070YV3  CDEAQ34070YV3   9C:B6:54:08:A3:7A   9C:B6:54:08:A3:78        
cn205   AQ34070YW7  CDEAQ34070YW7   9C:B6:54:08:25:22   9C:B6:54:08:25:20

Upvotes: 1

myusuf

Reputation: 12240

You could try the -g option:
sort -t, -k 2.3 -g fileName

-t seperator
-k key/column
-g general numeric sort

Upvotes: 0

Linux sort -Help Wanted

Answers (3)

Update

Related Questions