Reputation: 2916
Originally, the file has its contents like this:
1.2.3.4: 1,3,4
1.2.3.5: 9,8,7,6
1.2.3.4: 4,5,6
1.2.3.6: 1,1,1
after I have tried sorting it incorrectly I have this:
1.2.3.4: 1,3,4,4,5,6,
1.2.3.5: 9,8,7,6,
1.2.3.6: 1,1,1,
I want to sort it into the following format:
1.2.3.4: 1,3,4,5,6
1.2.3.5: 6,7,8,9
1.2.3.6: 1
but how do I access each comma-delimited character in each element and sort them uniquely ascending deleting duplicates? The only shell script I have managed to use so far accesses the whole element only:
awk -F' ' 'NF>1{a[$1] = a[$1]$2","}END{for(i in a){print i" "a[i] | "sort -t: -k1 "}}' c.txt
Upvotes: 0
Views: 622
Reputation: 44043
EDIT: I took the intermediate data as input the first time around, when the original data was not yet posted, but of course it's also possible from the original data. Again with GNU awk:
gawk -F '[ ,]' 'BEGIN { PROCINFO["sorted_in"] = "@ind_num_asc" } { for(i = 2; i <= NF; ++i) a[$1][$i]; } END { for(ip in a) { line = ip " "; for(n in a[ip]) { line = line n "," } sub(/,$/, "", line); print line } }' filename
The code works as follows:
BEGIN {
PROCINFO["sorted_in"] = "@ind_num_asc" # GNU-specific: sorted array
# traversal
}
{
for(i = 2; i <= NF; ++i) a[$1][$i] # remember numbers by ip
}
END { # in the end:
for(ip in a) { # for all ips:
line = ip " " # construct the line: IP
for(n in a[ip]) { # numbers in order
line = line n ","
}
sub(/,$/, "", line) # remove trailing comma
print line # print the result.
}
}
With GNU awk, assuming that the data is formatted precisely as in the question (with a trailing ,
):
gawk -F '[ ,]' 'BEGIN { PROCINFO["sorted_in"] = "@ind_num_asc" } { delete a; for(i = 2; i < NF; ++i) a[$i]; line = $1 " "; for(i in a) line = line i ","; sub(/,$/, "", line); print line; }' filename
The file contents are split along spaces and commas, then the code works as follows:
BEGIN {
PROCINFO["sorted_in"] = "@ind_num_asc" # GNU-specific: sorted array
# traversal, numerically ascending
}
{
delete a
for(i = 2; i < NF; ++i) { a[$i] } # remember the fields in a line.
# duplicates are removed here.
# note that it's < NF instead of
# <= NF because the trailing comma
# leaves us with an empty last
# field.
line = $1 " " # start building line: IP field
for(i in a) { # append numbers separated by
line = line i "," # commas
}
sub(/,$/, "", line) # remove last trailing comma
print line # print result.
}
Upvotes: 3