System
System

Reputation: 315

How to separate a column into two tab delimited columns in a text file

I have an input file that has 5 columns, but I need to separate the 5th column into two so the output file has a total of 6.

My input file data looks like this:

chrX    100629986   100630758   -   ENSG00000000003.14.IntrontENST00000373020.8.Intron  
chrX    100630866   100632484   -   ENSG00000000003.14.IntrontENST00000373020.8.Intron  
chrX    100632568   100633404   -   ENSG00000000003.14.IntrontENST00000373020.8.Intron

You notice that the 5th column has a similar structure in all my data so what I want to do is make my 5th column contain "ENSG00000000003.14.Intron" and my 6th column contain "tENST00000373020.8.Intron"

However not all my data has the .Intron tag such as:

chrX    100597503   100597531   +   ENSG00000000005.5tENST00000485971.1

But you'll notice all my data as the "t" and thus this is what I want to use to separate out these columns. I'm unsure how to do this for data that has several hundred thousand lines, and manually doing this would take way too long. I also need the entire file to be tab delimited so that I can continue further processing this data.

Thanks to everyone in advance,

Upvotes: 1

Views: 305

Answers (2)

glenn jackman
glenn jackman

Reputation: 247260

With awk, you would write

awk -F"\t" '{sub(/tENST/, FS "tENST", $5); print}' file > output

Upvotes: 2

choroba
choroba

Reputation: 242443

You can use sed to insert the tab:

sed 's/tENST/\t&/' < input > output

Every tENST string is replaced by a tab + the string.

For some sed versions, you might try $'s/tENST/\t&/' instead (i.e. prepend a $).

Upvotes: 5

Related Questions