alisrpp
alisrpp

Reputation: 27

Adding rows to a .txt file with 2 tab separated columns

I have a 2 columns (tab separated) .txt file that looks like:

1.00  GO:0005789,GO:0016021,GO:0005509,GO:0005506
3.33  GO:0005615,GO:0030325,GO:0009653
1.67  GO:0005615,GO:0030325
26.76 GO:0005737,GO:0003993,GO:0004726,GO:0004725

And I want to transform it into a 2 columns .txt file like:

1.00 GO:0005789
1.00 GO:0016021
1.00 GO:0005509
1.00 GO:0005506
3.33 GO:0005615
3.33 GO:0030325
3.33 GO:0009653
1.67 GO:0005615
1.67 GO:0030325
26.76 GO:0005737
26.76 GO:0003993
26.76 GO:0004726
26.76 GO:0004725

I tried sed 's/\(^[^,]*\).*/\1/g' <in.txt but what it does is to delete the GOterms except for the first one in each line. It gives me this:

1.00  GO:0005789
3.33  GO:0005615
1.67  GO:0005615
26.76 GO:0005737

Any suggestion? Using sed or not, everything is going to be welcome. Thanks in advance.

Upvotes: 1

Views: 31

Answers (2)

hek2mgl
hek2mgl

Reputation: 158280

Use awk for that:

awk -F',| +|\t' '{for(i=2;i<=NF;i++){print $1" "$i}}' input.txt

Upvotes: 2

zedfoxus
zedfoxus

Reputation: 37129

You could use awk for this:

$ cat test.txt
1.00    GO:0005789,GO:0016021,GO:0005509,GO:0005506
3.33    GO:0005615,GO:0030325,GO:0009653
1.67    GO:0005615,GO:0030325
26.76   GO:0005737,GO:0003993,GO:0004726,GO:0004725

$ awk -F'[\t,]' '{for (i=2;i<=NF;i++) print $1"\t"$i }' test.txt

Result:

1.00    GO:0005789
1.00    GO:0016021
1.00    GO:0005509
1.00    GO:0005506
3.33    GO:0005615
3.33    GO:0030325
3.33    GO:0009653
1.67    GO:0005615
1.67    GO:0030325
26.76   GO:0005737
26.76   GO:0003993
26.76   GO:0004726
26.76   GO:0004725

Explanation

  • -F sets the delimiters. Two delimiters are given here. One is \t and another is ,
  • NF tells us the number of fields. We loop from field #2 through as many fields there are. For each field found, we print the first field and current field

Upvotes: 1

Related Questions