Reputation: 169
I have a tab delimited file which contain >1000 columns and rows. I want to split every cell at first colon (:
) and extract the value. For instance:
Sample Input:
1 55 ./.:.:. 0|0:36:4 0|0:32:9 0|0:30:4 ./.:.:. ./.:.:. ./.:.:. 0|0:32:7
1 56 ./.:.:. ./.:.:. 0|0:32:9 0|0:30:4 ./.:.:. ./.:.:. ./.:.:. 0|0:32:7
Sample output:
1 55 ./. 0|0 0|0 0|0 ./. ./. ./. 0|0
1 56 ./. ./. 0|0 0|0 ./. ./. ./. 0|0
I can do this using python
script:
SHORT_col = ((str(cols[2]).split(':'))[0])
But I wonder is there any faster way to do this, as calling >1000 columns and processing using python script is time consuming. Your kind help will be highly appreciated.
Upvotes: 1
Views: 56
Reputation: 1528
In my interpreation you need to remove the :anything
(in regex it is: :.*
) pattern from every field. With awk
you can achieve this in the following way:
awk '{for (i=1; i<=NF; i++) sub(/:.*/, "", $i); print }' your_data_file
Using the for loop it iterates through all the fields and repalce the defined pattern with an empty string ""
using sub
function.
Output:
1 55 ./. 0|0 0|0 0|0 ./. ./. ./. 0|0
1 56 ./. ./. 0|0 0|0 ./. ./. ./. 0|0
Upvotes: 1