Reputation: 315
I have an input file that has 5 columns, but I need to separate the 5th column into two so the output file has a total of 6.
My input file data looks like this:
chrX 100629986 100630758 - ENSG00000000003.14.IntrontENST00000373020.8.Intron
chrX 100630866 100632484 - ENSG00000000003.14.IntrontENST00000373020.8.Intron
chrX 100632568 100633404 - ENSG00000000003.14.IntrontENST00000373020.8.Intron
You notice that the 5th column has a similar structure in all my data so what I want to do is make my 5th column contain "ENSG00000000003.14.Intron" and my 6th column contain "tENST00000373020.8.Intron"
However not all my data has the .Intron tag such as:
chrX 100597503 100597531 + ENSG00000000005.5tENST00000485971.1
But you'll notice all my data as the "t" and thus this is what I want to use to separate out these columns. I'm unsure how to do this for data that has several hundred thousand lines, and manually doing this would take way too long. I also need the entire file to be tab delimited so that I can continue further processing this data.
Thanks to everyone in advance,
Upvotes: 1
Views: 305
Reputation: 247260
With awk, you would write
awk -F"\t" '{sub(/tENST/, FS "tENST", $5); print}' file > output
Upvotes: 2
Reputation: 242443
You can use sed to insert the tab:
sed 's/tENST/\t&/' < input > output
Every tENST string is replaced by a tab + the string.
For some sed
versions, you might try $'s/tENST/\t&/'
instead (i.e. prepend a $
).
Upvotes: 5