Waqas Khokhar
Waqas Khokhar

Reputation: 169

Split every cell at colon

I have a tab delimited file which contain >1000 columns and rows. I want to split every cell at first colon (:) and extract the value. For instance:

Sample Input:

1   55  ./.:.:. 0|0:36:4    0|0:32:9    0|0:30:4    ./.:.:. ./.:.:. ./.:.:. 0|0:32:7
1   56  ./.:.:. ./.:.:. 0|0:32:9    0|0:30:4    ./.:.:. ./.:.:. ./.:.:. 0|0:32:7

Sample output:

1   55  ./. 0|0 0|0 0|0 ./. ./. ./. 0|0
1   56  ./. ./. 0|0 0|0 ./. ./. ./. 0|0

I can do this using python script:

SHORT_col = ((str(cols[2]).split(':'))[0])

But I wonder is there any faster way to do this, as calling >1000 columns and processing using python script is time consuming. Your kind help will be highly appreciated.

Upvotes: 1

Views: 56

Answers (1)

marcell
marcell

Reputation: 1528

In my interpreation you need to remove the :anything (in regex it is: :.*) pattern from every field. With awk you can achieve this in the following way:

awk '{for (i=1; i<=NF; i++) sub(/:.*/, "", $i); print }' your_data_file

Using the for loop it iterates through all the fields and repalce the defined pattern with an empty string "" using sub function.

Output:

1 55 ./. 0|0 0|0 0|0 ./. ./. ./. 0|0
1 56 ./. ./. 0|0 0|0 ./. ./. ./. 0|0

Upvotes: 1

Related Questions