Reputation: 212
I'm trying to transform this data:
EU842263.1.1492 AA.A_1 BB.B_2 CC.C_3
LN612956.1.1401 AA.A_1 BB.B_2 CC.C_3 DD.D_4 EE.E_5 FF.F_6
GU304497.1.1513 AA.A_1
AB905872.1.1334 AA.A_1 BB.B_2 CC.C_3 DD.D_4
Into this:
EU842263.1.1492 AA.A_1
EU842263.1.1492 BB.B_2
EU842263.1.1492 CC.C_3
LN612956.1.1401 AA.A_1
LN612956.1.1401 BB.B_2
LN612956.1.1401 CC.C_3
LN612956.1.1401 DD.D_4
LN612956.1.1401 EE.E_5
LN612956.1.1401 FF.F_6
GU304497.1.1513 AA.A_1
AB905872.1.1334 AA.A_1
AB905872.1.1334 BB.B_2
AB905872.1.1334 CC.C_3
AB905872.1.1334 DD.D_4
How can I achieve this?
Note that data (AA.A_1) are only representations of my real data (for example, 0.15.01610.011_528399).
Upvotes: 1
Views: 374
Reputation: 85895
You can do this in Awk
, get a hash-map of values in first column as key and values in the rest of the row as hash values.
awk '
{
for(i=2;i<=NF;i++)
unique[$1]=(unique[$1]FS$i); next
} END {
for (i in unique) {
n=split(unique[i],temp);
for(j=1;j<=n;j++)
print i,temp[j]
}
}' file
should work on awk
present on any POSIX compliant shell.
The steps:-
for(i=2;i<=NF;i++)
runs for column number 2 till the last column in each line and a hash-map unique
is created based on value of first column($1
) and other columns are designated from $2
until $NF
END
runs after all the lines are processed. We use the split()
call to separate each value from the array and store them as individual elements in array temp
.temp
and print the index along with the element in the new array.Upvotes: 1