Reputation: 357
I have a line that goes like:
string 2 2 3 3 1 4
where the 2nd, 4th and 6th columns represent an ID (assuming each ID number is unique) and 3rd, 5th and 7th columns represent some data associated with respective ID.
How can I re-arrange the line so that it will be sorted by the ID?
string 1 4 2 2 3 3
Note: a line may have any number of IDs, unlike the example.
Using shell script, I'm thinking something like
while read n
do
echo $(echo $n | sork -k (... stuck here) )
done < infile
Upvotes: 1
Views: 143
Reputation: 46836
I'll add an gawk solution to your long list of options.
This is a standalone script:
#!/usr/bin/env gawk -f
{
line=$1
# Collect the tuples into values of an array,
for (i=2;i<NF;i+=2) a[i]=$i FS $(i+1)
# This sorts the array "a" by value, numerically, ascending...
asort(a, a, "@val_num_asc")
# And this for loop gathers the result.
for (i=0; i<length(a); i++) line=line FS a[i]
# Finally, print the line,
print line
# and clear the array for the next round.
delete a
}
This works by copying your tuples into an array, sorting the array, then reassembling the sorted tuples in a for loop that prints the array elements.
Note that it's gawk-only (not traditional awk) because of the use of asort()
.
$ cat infile
string 2 2 3 3 1 4
other 5 1 20 9 3 7
$ ./sorttuples infile
string 1 4 2 2 3 3
other 3 7 5 1 20 9
Upvotes: 2
Reputation: 49784
As a bash script this can be done with:
Code:
#!/usr/bin/env bash
# send field pairs as separate lines
function emit_line() {
while [ $# -gt 0 ] ; do
echo "$1" "$2"
shift; shift
done
}
# break the line into pieces and send to sort
function sort_line() {
echo $1
shift
emit_line $* | sort
}
# loop through the lines in the file and sort by key-value pairs
while read n; do
echo $(sort_line $n)
done < infile
File infile
:
string 2 2 3 3 1 4
string 2 2 0 3 4 4 1 7
string 2 2 0 3 2 1
Output:
string 1 4 2 2 3 3
string 0 3 1 7 2 2 4 4
string 0 3 2 1 2 2
Update:
Cribbing the sort from grail's version, to remove the (much slower) external sort:
function sort_line() {
line="$1"
shift
while [ $# -gt 0 ] ; do
data[$1]=$2
shift; shift
done
for i in ${!data[@]}; do
out="$line $i ${data[i]}"
done
unset data
echo $line
}
while read n; do
sort_line $n
done < infile
Upvotes: 1
Reputation: 930
Another bash alternative which does not rely on how many ids there are:
#!/usr/bin/env bash
x='string 2 2 3 3 1 4'
out="${x%% *}"
in=($x)
for (( i = 1; i < ${#in[*]}; i += 2 ))
do
new[${in[i]}]=${in[i+1]}
done
for i in ${!new[@]}
do
out="$out $i ${new[i]}"
done
echo $out
You can put a loop around the lot if you then want to read a file
Upvotes: 2
Reputation: 49784
You can use python for this. This function breaks up the column into a list
of tuples
that can then be sorted. itertools.chain
is then used to re-assemble the key values pairs.
Code:
import itertools as it
def sort_line(line):
# split the line on white space
x = line.split()
# make a tuple of key value pairs
as_tuples = [tuple(x[i:i+2]) for i in range(1, len(x), 2)]
# sort the tuples, and flatten them with chain
sorted_kv = list(it.chain(*sorted(as_tuples)))
# join the results back into a string
return ' '.join([x[0]] + sorted_kv)
Test Code:
data = [
"string 2 2 3 3 1 4",
"string 2 2 0 3 4 4 1 7",
]
for line in data:
print(sort_line(line))
Results:
string 1 4 2 2 3 3
string 0 3 1 7 2 2 4 4
Upvotes: 1