Misha Slyusarev
Misha Slyusarev

Reputation: 1383

Trim extra spaces in AWK

I have this AWK script.

awk -v line="    foo    bar  " 'END
 {
   gsub(/^ +| +$/,"", line);
   gsub(/ {2,}/, " ", line);
   print line
 }' \
somefile.txt

The input file (somefile.txt) is irrelevant to my question. The part that goes after the END pattern is there to trim extra spaces in the line variable and print it out. Like this:

foo bar

I'm trying to see if there is a better, more compact way to do that in AWK. Using gsub to remove a couple of extra spaces is very cumbersome. It is hard to read and hard for a maintainer to understand what it does (especially if one never worked with AWK before). Any ideas on how to make it shorter or more explicit?

Thanks!

** EDIT **

AWK variable line is filtered during the awk processing of the input file and I want to trim extra spaces left after that.

Upvotes: 3

Views: 2578

Answers (6)

David C. Rankin
David C. Rankin

Reputation: 84642

Another option using gsub() as you began to do can be done as:

awk '{gsub(/  +/," "); sub(/^ /,""); sub(/ $/,"")}1' <<< "    foo    bar  "

Where the first call to gsub() consolidates all multiple spaces to a single space before/between the fields. The second sub(/^ /,"") just trims the single space that remains at the front of the string, and finally the last sub(/ $/,"") trims the trailing space.

Either approach works well. Depending on your actual data and your FS value, there may be a preference for one over the other, but without knowing more, they are pretty much a wash.

Example Use/Output

$ awk '{gsub(/  +/," "); sub(/^ /,""); sub(/ $/,"")}1' <<< "    foo    bar  "
foo bar

Upvotes: 7

The fourth bird
The fourth bird

Reputation: 163597

For the current example, another option might be to recalculate the text of the input record by first setting the value of line to the input record and then use $1=$1

awk -v line="    foo    bar  " 'END {$0=line; $1=$1; print}' somefile.txt

Output (the quotes are only for clarity that there are no leading or trailing spaces)

"foo bar"

The inner workings how the spaces are removed are described in the comments by Ed Morton:

Setting $0=line or any other change to $0 would trigger the fields being recalculated.

Using $1=$1 triggers the record to be recalculated in as much as it'll be rebuilt from the existing fields thereby stripping leading/trailing white space and replacing every other chain of contiguous white space with a single blank char (assuming the default FS and OFS are used).

Upvotes: 5

Ed Morton
Ed Morton

Reputation: 204548

With any awk using any value of FS and any value of OFS if your spaces are all blank chars as handled by the code in your question, here's how to do it briefly and explicitly as requested in your question:

gsub(/ +/, " ", line)
gsub(/^ | $/, "", line)

For example, lets say you have a CSV and want to print the number of fields in each line followed by the fields separated by |s. A sample input file would be:

$ cat file
stuff,nonsense

and the awk script to process that would be:

$ awk -v FS=',' -v OFS='|' '
    { print NF, $1, $2 }
' file
2|stuff|nonsense

Now let's introduce your line variable and it's associated handling (I added < and > to the output to show that the leading/trailing spaces were stripped):

$ awk -v line='    foo    bar  ' -v FS=',' -v OFS='|' '
    { print NF, $1, $2 }
    END {
        gsub(/ +/, " ", line)
        gsub(/^ | $/, "", line)
        print "<" line ">"
    }
' file
2|stuff|nonsense
<foo bar>

and as you can see everything works exactly as intended while all of the other solutions posted so far would fail in various ways.

If the spaces in line aren't all blanks then using a POSIX awk for any type of white space characters in line (with a non-POSIX awk replace [[:space:]] with [ \t] to catch the most common chars of blank and tab, add others as you like):

gsub(/[[:space:]]+/, " ", line)
gsub(/^ | $/, "", line)

Your script:

gsub(/^ +| +$/,"", line);
gsub(/ {2,}/, " ", line);

was lengthier than it had to be because you're doing the gsub()s in the wrong order which necessitates the +s in the first one and unnecessarily checking for 2 or more blanks ({2,}) in the second one. It also wouldn't work if some of the spaces were tabs or some other white space characters.

Upvotes: 1

Renaud Pacalet
Renaud Pacalet

Reputation: 29345

Using the split function to collect all fields in an array and substr to remove the last leading space:

$ awk -vline="    foo    bar  " 'END {s = ""; l = split(line, a)
    for(i = 1; i <= l; i++) s = s " " a[i]; print substr(s, 2) "X"}' /dev/null
foo barX

The trailing X is here to show that the trailing spaces are also removed. Suppress it if you finally decide to use this. Other solution with patsplit instead of split:

$ awk -vline="    foo    bar  " 'END {s = ""; l = patsplit(line, a, /[^ ]+/)
    for(i = 1; i <= l; i++) s = s " " a[i]; print substr(s, 2) "X"}' /dev/null
foo barX

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133760

With your shown samples, please try following awk program. Since you are having an awk variable and you are NOT reading any Input_file then we need NOT to use END block we could actually use BEGIN block itself in awk program to read variable.

In this awk program I am creating awk variable named line and in BEGIN section of this program I am globally substituting starting and ending spaces with NULL in line THEN globally substituting all occurrences of spaces(1 or more) with OFS(which is a single space itself) in variable line, then printing its value.

awk -v line="    foo    bar  " '
BEGIN{
  gsub(/^[[:space:]]+|[[:space:]]+$/,"",line)
  gsub(/[[:space:]]+/,OFS,line)
  print line
}
'

OR Considering you have other functions/tasks/work happening in your awk program and you want to do trimming of variable in END section only then try following

awk -v line="    foo    bar  " '
END{
  gsub(/^[[:space:]]+|[[:space:]]+$/,"",line)
  gsub(/[[:space:]]+/,OFS,line)
  print line
}
'  Input_file

Upvotes: 3

James Brown
James Brown

Reputation: 37464

I'm on @DavidC.Rankin's comment's path with:

$ awk  -v line="    foo    bar  " '
BEGIN {
    $0=line
    for(i=1;i<=NF;i++)
        printf "%s%s",$i,(i==NF?ORS:OFS)
}'

Output:

foo bar

Upvotes: 4

Related Questions