ALG
ALG

Reputation: 212

Transpose rows to columns using the first column as reference in unix shell

I'm trying to transform this data:

EU842263.1.1492 AA.A_1 BB.B_2 CC.C_3
LN612956.1.1401 AA.A_1 BB.B_2 CC.C_3 DD.D_4 EE.E_5 FF.F_6
GU304497.1.1513 AA.A_1
AB905872.1.1334 AA.A_1 BB.B_2 CC.C_3 DD.D_4

Into this:

EU842263.1.1492 AA.A_1
EU842263.1.1492 BB.B_2
EU842263.1.1492 CC.C_3
LN612956.1.1401 AA.A_1
LN612956.1.1401 BB.B_2
LN612956.1.1401 CC.C_3
LN612956.1.1401 DD.D_4
LN612956.1.1401 EE.E_5
LN612956.1.1401 FF.F_6
GU304497.1.1513 AA.A_1
AB905872.1.1334 AA.A_1
AB905872.1.1334 BB.B_2
AB905872.1.1334 CC.C_3
AB905872.1.1334 DD.D_4

How can I achieve this?

Note that data (AA.A_1) are only representations of my real data (for example, 0.15.01610.011_528399).

Upvotes: 1

Views: 374

Answers (1)

Inian
Inian

Reputation: 85895

You can do this in Awk, get a hash-map of values in first column as key and values in the rest of the row as hash values.

awk '
   {
       for(i=2;i<=NF;i++) 
           unique[$1]=(unique[$1]FS$i); next 
   } END { 
      for (i in unique) { 
           n=split(unique[i],temp); 
           for(j=1;j<=n;j++) 
               print i,temp[j] 
         } 
   }' file

should work on awk present on any POSIX compliant shell.

The steps:-

  • The loop for(i=2;i<=NF;i++) runs for column number 2 till the last column in each line and a hash-map unique is created based on value of first column($1) and other columns are designated from $2 until $NF
  • The part under END runs after all the lines are processed. We use the split() call to separate each value from the array and store them as individual elements in array temp.
  • The we run a loop for all array elements in temp and print the index along with the element in the new array.

Upvotes: 1

Related Questions