Reputation: 5927
I need to compare two versions of the same file. Both are tab-separated and have this form:
<filename1><tab><Marker11><tab><Marker12>...
<filename2><tab><Marker21><tab><Marker22><tab><Marker22>...
So each row has a different number of markers (the number varies between 1 and 10) and they all come from a small set of possible markers. So a file looks like this:
fileX<tab>Z<tab>M<tab>A
fileB<tab>Y
fileM<tab>M<tab>C<tab>B<tab>Y
What I need is:
So for the example above, the result would be
fileB<tab>Y
fileM<tab>B<tab>C<tab>M<tab>Y
fileX<tab>A<tab>M<tab>Z
It's easy to do #1 using sort
but how do I do #2?
UPDATE: It's not a duplicate of this post since my rows are of different length and I need each rows (the entries after the filename) sorted individually, i.e. the only column that gets preserved is the first one.
Upvotes: 1
Views: 111
Reputation: 203324
All you need is:
awk '
{ for (i=2;i<=NF;i++) arr[$1][$i] }
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for (i in arr) {
printf "%s", i
for (j in arr[i]) {
printf "%s%s, OFS, arr[i][j]
}
print ""
}
}
' file
The above uses GNU awk for true multi-dimensional arrays plus sorted_in
Upvotes: 1
Reputation: 92854
awk solution:
awk 'BEGIN{ FS=OFS="\t"; PROCINFO["sorted_in"]="@ind_str_asc" }
{ split($0,b,FS); delete b[1]; asort(b); r="";
for(i in b) r=(r!="")? r OFS b[i] : b[i]; a[$1] = r
}
END{ for(i in a) print i,a[i] }' file
The output:
fileB Y
fileM B C M Y
fileX A M Z
PROCINFO["sorted_in"]="@ind_str_asc"
- sort mode
split($0,b,FS);
- split the line into array b
by FS
(field separator)
asort(b)
- sort marker values
Upvotes: 1