Reputation: 41
I have a tab delimited txt file in which third column contains enclosed string that might also has a tab. Because of this extra tab i am getting 5 columns when i try to read this tab delimited file. So i want to replace the tab with space.
Following is the sample file.
col1 col2 col3 col4
1 abc "pqr xyz" asd
2 asd "lmn pqr" aws
3 abc "asd" lmn
I want the output like this
col1 col2 col3 col4
1 abc "pqr xyz" asd
2 asd "lmn pqr" aws
3 abc "asd" lmn
Here is what i have tried
awk -F"\t" '{ gsub("\t","",$3); print $3 }' file.txt
after that i am getting following output
col3
"pqr
"lmn
"asd"
Please help
Upvotes: 1
Views: 2377
Reputation: 157967
Having GNU awk (gawk) you can use the following expression:
gawk '{gsub("\t"," ",$3)}1' OFS='\t' FPAT='"[^"]*"|[^\t]*' file
The key here is the FPAT
variable. It defines how a field can look like instead of just specifying the field delimiter.
In our case a field can either be an sequence of non-double-quote chars enclosed in double quotes "[^"]*"
or a sequence of zero or more non tab characters [^\t]*
. (zero, to handle empty fields properly)
Since we are specifying the sequence of non quote characters first it has a precedence.
Upvotes: 4