Shubham Gupta
Shubham Gupta

Reputation: 660

How to split a column which has multiple dots using Linux command line

I have a file which looks like this: chr10:100013403..100013414,- 0 0 0 0 chr10:100027943..100027958,- 0 0 0 0 chr10:100076685..100076699,+ 0 0 0 0

I want output to be like: chr10 100013403 100013414 - 0 0 0 0 chr10 100027943 100027958 - 0 0 0 0 chr10 100076685 100076699 + 0 0 0 0

So, I want the first column to be tab separated at field delimiter = : , ..
I have used awk -F":|," '$1=$1' OFS="\t" file to separate first column. But, I am still struggling with .. characters. I tried awk -F":|,|.." '$1=$1' OFS="\t" file but this doesn't work.

Upvotes: 1

Views: 1735

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133710

If your Input_file is same as shown sample then following may help you too in same.

awk '{gsub(/:|\.+|\,/,"\t");} 1'   Input_file

Here I am using gsub keyword of awk to globally substitute (:) (.+ which will take all dots) (,) with TAB and then 1 will print the edited/non-edited line of Input_file. I hope this helps.

Upvotes: 0

haolee
haolee

Reputation: 937

.. should be escaped.

awk -F':|,|\\.\\.' '$1=$1' OFS="\t" file

It is important to remember that when you assign a string constant as the value of FS, it undergoes normal awk string processing. For example, with Unix awk and gawk, the assignment FS = "\.." assigns the character string .. to FS (the backslash is stripped). This creates a regexp meaning “fields are separated by occurrences of any two characters.” If instead you want fields to be separated by a literal period followed by any single character, use FS = "\\..".

https://www.gnu.org/software/gawk/manual/html_node/Field-Splitting-Summary.html

Upvotes: 1

Related Questions