namesake22
namesake22

Reputation: 357

sort a line with bunch of numbers

I have a line that goes like:

string 2 2 3 3 1 4

where the 2nd, 4th and 6th columns represent an ID (assuming each ID number is unique) and 3rd, 5th and 7th columns represent some data associated with respective ID.

How can I re-arrange the line so that it will be sorted by the ID?

string 1 4 2 2 3 3

Note: a line may have any number of IDs, unlike the example.

Using shell script, I'm thinking something like

while read n    
do
   echo $(echo $n | sork -k (... stuck here) )
done < infile

Upvotes: 1

Views: 143

Answers (4)

ghoti
ghoti

Reputation: 46836

I'll add an gawk solution to your long list of options.

This is a standalone script:

#!/usr/bin/env gawk -f

{
    line=$1

    # Collect the tuples into values of an array,
    for (i=2;i<NF;i+=2) a[i]=$i FS $(i+1)

    # This sorts the array "a" by value, numerically, ascending...
    asort(a, a, "@val_num_asc")

    # And this for loop gathers the result.
    for (i=0; i<length(a); i++) line=line FS a[i]

    # Finally, print the line,
    print line

    # and clear the array for the next round.
    delete a
}

This works by copying your tuples into an array, sorting the array, then reassembling the sorted tuples in a for loop that prints the array elements.

Note that it's gawk-only (not traditional awk) because of the use of asort().

$ cat infile
string 2 2 3 3 1 4
other 5 1 20 9 3 7
$ ./sorttuples infile
string   1 4 2 2 3 3
other   3 7 5 1 20 9

Upvotes: 2

Stephen Rauch
Stephen Rauch

Reputation: 49784

As a bash script this can be done with:

Code:

#!/usr/bin/env bash

# send field pairs as separate lines
function emit_line() {
    while [ $# -gt 0 ] ; do
        echo "$1" "$2"
        shift; shift
    done
}

# break the line into pieces and send to sort
function sort_line() {
    echo $1
    shift
    emit_line $* | sort
}

# loop through the lines in the file and sort by key-value pairs
while read n; do
   echo $(sort_line $n)
done < infile

File infile:

string 2 2 3 3 1 4
string 2 2 0 3 4 4 1 7
string 2 2 0 3 2 1

Output:

string 1 4 2 2 3 3
string 0 3 1 7 2 2 4 4
string 0 3 2 1 2 2

Update:

Cribbing the sort from grail's version, to remove the (much slower) external sort:

function sort_line() {
    line="$1"
    shift

    while [ $# -gt 0 ] ; do
        data[$1]=$2
        shift; shift
    done

    for i in ${!data[@]}; do
        out="$line $i ${data[i]}"
    done
    unset data
    echo $line
}

while read n; do
   sort_line $n
done < infile

Upvotes: 1

grail
grail

Reputation: 930

Another bash alternative which does not rely on how many ids there are:

#!/usr/bin/env bash

x='string 2 2 3 3 1 4'
out="${x%% *}" 

in=($x)

for (( i = 1; i < ${#in[*]}; i += 2 ))
do
  new[${in[i]}]=${in[i+1]}
done

for i in ${!new[@]}
do
  out="$out $i ${new[i]}"
done

echo $out

You can put a loop around the lot if you then want to read a file

Upvotes: 2

Stephen Rauch
Stephen Rauch

Reputation: 49784

You can use python for this. This function breaks up the column into a list of tuples that can then be sorted. itertools.chain is then used to re-assemble the key values pairs.

Code:

import itertools as it

def sort_line(line):
    # split the line on white space
    x = line.split()

    # make a tuple of key value pairs
    as_tuples = [tuple(x[i:i+2]) for i in range(1, len(x), 2)]

    # sort the tuples, and flatten them with chain
    sorted_kv = list(it.chain(*sorted(as_tuples)))

    # join the results back into a string
    return ' '.join([x[0]] + sorted_kv)

Test Code:

data = [
    "string 2 2 3 3 1 4",
    "string 2 2 0 3 4 4 1 7",
]

for line in data:
    print(sort_line(line))

Results:

string 1 4 2 2 3 3
string 0 3 1 7 2 2 4 4

Upvotes: 1

Related Questions