Reputation: 523

How to remove leading and trailing whitespaces for a specific field from the record using awk command?

I'm trying to remove leading and trailing space in 3rd column of the below text.txt, and append it at the end.:

text.txt ==> Field1 ~ Field2 ~ Field3 ~ Field4 ~ Field5

1 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA        
2 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA        
3 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA        
4 ABC XYZ     ~400 1      ~ 12    A1  B333       ~ TN     ~ USA

Expected output:

1 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1        
2 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1        
3 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1        
4 ABC XYZ     ~400 1      ~ 12    A1  B333       ~ TN     ~ USA~12 A1 B333

However I am getting result as, where the last field all white space got removed including the space which is a word separator.

1 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12A1        
2 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12A1        
3 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12A1        
4 ABC XYZ     ~400 1      ~ 12    A1  B333       ~ TN     ~ USA~12A1B333

I have used the below awk to remove leading and trailing space in 3rd column but it is not working. What am I missing? Code i am using, and all these are removing the white space which are in-between the string. Here I just need to remove the leading and training space, and not white space between the value.

Code 1:

awk 'BEGIN { FS = "~" }; {fullrow=$0}; {gsub(/ /, "", $3) } ; { print fullrow"~"$3 }' text.txt

Code 2:

awk 'BEGIN { FS = "~" }; {fullrow=$0}; {gsub(/^[[:space:]]+|[[:space:]]+/,"",$3) }; { print fullrow"~"$3 } ' text.txt

Code 3:

awk 'BEGIN { FS = "~" }; {fullrow=$0}; {gsub(/^[ \t]+|[ \t]+/,"",$3) }; { print fullrow"~"$3 } ' text.txt

Please help.

Upvotes: 1

Answers (2)

anubhava

Reputation: 785156

You may use:

awk -F '[[:blank:]]*~[[:blank:]]*' -v OFS='~' '{s=$3; gsub(/[[:blank:]]{2,}/, " ", s); sub(/[[:blank:]]+$/, ""); print $0, s}' file

1 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1
2 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1
3 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1
4 ABC XYZ     ~400 1      ~ 12    A1  B333       ~ TN     ~ USA~12 A1 B333

To make it more readable:

awk -F '[[:blank:]]*~[[:blank:]]*' -v OFS='~' '{
   s=$3
   gsub(/[[:blank:]]{2,}/, " ", s)
   sub(/[[:blank:]]+$/, "");
   print $0, s
}' file

Upvotes: 3

RavinderSingh13

Reputation: 133518

With your shown samples, please try following. Written and tested in GNU awk.

awk '
BEGIN{
  FS=OFS="~"
}
{
  val=$3
  gsub(/^ +| +$/,"",val)
  gsub(/[[:space:]]+/," ",val)
  sub(/ +$/,"")
  print $0,val
}
' Input_file

Explanation: Adding detailed explanation for above.

awk '                           ##Starting awk program from here.
BEGIN{                          ##Starting BEGIN section of this program from here.
  FS=OFS="~"                    ##Setting fs and ofs as ~ from here.
}
{
  val=$3                        ##Setting 3rd field value to val here.
  gsub(/^ +| +$/,"",val)        ##Globally substituting starting and ending spaces in val with NULL.
  gsub(/[[:space:]]+/," ",val)
  sub(/ +$/,"")                 ##Substituting trailing spaces with null in current line.
  print $0,val                  ##Printing current line and val here.
}
' Input_file                    ##Mentioning Input_file name here.

Output will be as follows.

1 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1
2 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1
3 ABC XYZ     ~400 1      ~ 12    A1            ~ TN     ~ USA~12 A1
4 ABC XYZ     ~400 1      ~ 12    A1  B333       ~ TN     ~ USA~12 A1 B333

Upvotes: 2

How to remove leading and trailing whitespaces for a specific field from the record using awk command?

Answers (2)

Related Questions