arshad
arshad

Reputation: 41

replace tab in an enclosed string in a tab delimited file linux

I have a tab delimited txt file in which third column contains enclosed string that might also has a tab. Because of this extra tab i am getting 5 columns when i try to read this tab delimited file. So i want to replace the tab with space.

Following is the sample file.

col1   col2   col3        col4  
1      abc    "pqr   xyz" asd  
2      asd    "lmn   pqr" aws  
3      abc    "asd"       lmn

I want the output like this

col1   col2   col3        col4  
1      abc    "pqr xyz"   asd  
2      asd    "lmn pqr"   aws  
3      abc    "asd"       lmn

Here is what i have tried

awk -F"\t" '{ gsub("\t","",$3); print $3 }' file.txt

after that i am getting following output

col3  
"pqr  
"lmn  
"asd"

Please help

Upvotes: 1

Views: 2377

Answers (1)

hek2mgl
hek2mgl

Reputation: 157967

Having GNU awk (gawk) you can use the following expression:

gawk '{gsub("\t"," ",$3)}1' OFS='\t' FPAT='"[^"]*"|[^\t]*' file

The key here is the FPAT variable. It defines how a field can look like instead of just specifying the field delimiter.

In our case a field can either be an sequence of non-double-quote chars enclosed in double quotes "[^"]*" or a sequence of zero or more non tab characters [^\t]*. (zero, to handle empty fields properly)

Since we are specifying the sequence of non quote characters first it has a precedence.

Upvotes: 4

Related Questions