Replace specific commas in a csv file

Question

I have a file like this:

gene_id,transcript_id(s),length,effective_length,expected_count,TPM,FPKM,id
ENSG00000000003.14,ENST00000373020.8,ENST00000494424.1,ENST00000496771.5,ENST00000612152.4,ENST00000614008.4,2.23231E3,2.05961E3,2493,2.112E1,1.788E1,00065a62-5e18-4223-a884-12fca053a109
ENSG00000001084.10,ENST00000229416.10,ENST00000504353.1,ENST00000504525.1,ENST00000505197.1,ENST00000505294.5,ENST00000509541.5,ENST00000510837.5,ENST00000513939.5,ENST00000514004.5,ENST00000514373.2,ENST00000514933.1,ENST00000515580.1,ENST00000616923.4,3.09456E3,2.92186E3,3111,1.858E1,1.573E1,00065a62-5e18-4223-a884-12fca053a109

The problem is that instead of ,, the file should've been tab delimited because the values starting from ENST (i.e. transcript_id(s)) are grouped in one column.

The number of ENST IDs is different in each line.

Each ENST ID has the same pattern: starts from ENST, followed by 11 digits followed by a period and then 1-3 digits: ^ENST[0-9]{11}[.][0-9]{1,3}.

I want to convert all the comma's between ENST ids to a : or any other character to read this as a csv file. Any help would be much appreciated. Thanks!

mike.dld · Accepted Answer

I imagine something as simple as

sed 's|,ENST|:ENST|g;s|:|,|' < /path/to/your/file

should work. No reason to over-complicate.

Replace specific commas in a csv file

Answers (1)

Related Questions