Reputation: 27
I have a 2 columns (tab separated) .txt file that looks like:
1.00 GO:0005789,GO:0016021,GO:0005509,GO:0005506
3.33 GO:0005615,GO:0030325,GO:0009653
1.67 GO:0005615,GO:0030325
26.76 GO:0005737,GO:0003993,GO:0004726,GO:0004725
And I want to transform it into a 2 columns .txt file like:
1.00 GO:0005789
1.00 GO:0016021
1.00 GO:0005509
1.00 GO:0005506
3.33 GO:0005615
3.33 GO:0030325
3.33 GO:0009653
1.67 GO:0005615
1.67 GO:0030325
26.76 GO:0005737
26.76 GO:0003993
26.76 GO:0004726
26.76 GO:0004725
I tried sed 's/\(^[^,]*\).*/\1/g' <in.txt
but what it does is to delete the GOterms except for the first one in each line. It gives me this:
1.00 GO:0005789
3.33 GO:0005615
1.67 GO:0005615
26.76 GO:0005737
Any suggestion? Using sed or not, everything is going to be welcome. Thanks in advance.
Upvotes: 1
Views: 31
Reputation: 158280
Use awk
for that:
awk -F',| +|\t' '{for(i=2;i<=NF;i++){print $1" "$i}}' input.txt
Upvotes: 2
Reputation: 37129
You could use awk
for this:
$ cat test.txt
1.00 GO:0005789,GO:0016021,GO:0005509,GO:0005506
3.33 GO:0005615,GO:0030325,GO:0009653
1.67 GO:0005615,GO:0030325
26.76 GO:0005737,GO:0003993,GO:0004726,GO:0004725
$ awk -F'[\t,]' '{for (i=2;i<=NF;i++) print $1"\t"$i }' test.txt
Result:
1.00 GO:0005789
1.00 GO:0016021
1.00 GO:0005509
1.00 GO:0005506
3.33 GO:0005615
3.33 GO:0030325
3.33 GO:0009653
1.67 GO:0005615
1.67 GO:0030325
26.76 GO:0005737
26.76 GO:0003993
26.76 GO:0004726
26.76 GO:0004725
Explanation
-F
sets the delimiters. Two delimiters are given here. One is \t
and another is ,
NF
tells us the number of fields. We loop from field #2 through as many fields there are. For each field found, we print the first field and current fieldUpvotes: 1