Code_So1dier
Code_So1dier

Reputation: 942

How to make awk not to skip empty columns?

Given this input_file:

1234 1234 abcd
1234      abcd

awk doesn't recognise an empty column, when I run:

awk '{print $1,$2}' input_file

I get:

1234 1234
1234 abcd

How to make awk to give me:

1234 1234
1234 

Upvotes: 1

Views: 2801

Answers (3)

david
david

Reputation: 449

I think the most simple approach is to declare the field separator as '\t' (assuming it's indeed tab-delimited).

awk -F'\t' '{print $1,$2}' file_name

Your code should work now as expected.

Upvotes: 1

paxdiablo
paxdiablo

Reputation: 881513

The awk program usually uses field separators to decide what characters belong in what fields. If your second line contains only spaces, there's no way to use that method to split as you wish.

However, GNU awk allows you to set a FIELDWIDTHS variable which will better suit fixed-width data, since that appears to be what you have:

pax> cat infile
1234 5678 abcd
1234      abcd

pax> awk 'BEGIN{FIELDWIDTHS="4 1 4"}{print "<"$1","$3">"}' infile
<1234,5678>
<1234,    >

It's field one and three in this case since field two is the space between the first and second real column:

1234 5678 abcd
\__/|\__/|\__/
  1 2  3 4  5

I usually do that since I don't want the space to become part of the data (in case I want a different character in the output as in my example) but, if you're transferring the space anyway, you could also use the simpler:

pax> awk 'BEGIN{FIELDWIDTHS="5 4"}{print "<"$1$2">"}' infile
<1234 5678>
<1234     >

In that case, field 1 is the five characters 1234<space>.


If you want to do fixed width processing but with the ability to easily adapt to later width changes, you can modify the awk script so it gets that information from the file itself.

Not from the actual data lines since the fields there may have spaces, but you can add a header line to fully specify the widths to use (ensuring the header line isn't treated as data of course).

The following transcript shows this in action (the awk script is now in a file since it's getting complex):

pax> cat infile
#### ###### ####
1234 567890 abcd
1234        abcd

pax> cat awkfile.awk
NR == 1 {
    # Header: construct field widths string
    #    "a 1 b 1 c 1 d ... z"
    # where a..z are lengths of fields.

    FIELDWIDTHS = length($1)
    for (i = 2; i < NF; i++) {
        FIELDWIDTHS = FIELDWIDTHS" 1 "length($i)
    }
    next
}
{
    # Then use that FIELDWIDTHS string for
    # all other records.

    print "<"$1","$3">"
}

pax> awk -f awkfile.awk infile
<1234,567890>
<1234,      >

You'll find that you can change the field lengths as much as you want and, provided the header line is correct, it will adapt.

Upvotes: 3

George Vasiliou
George Vasiliou

Reputation: 6345

Having field delimiter == field is kind of impossibe. You need to consider manipulation of input data.

Here are some examples for fixed width fields:

$ awk '{gsub(" [[:space:]]{4} "," ---- ");print}' file1
1234 1234 abcd
1234 ---- abcd

You can revert back anytime:

$ awk '{gsub(" [[:space:]]{4} "," ---- ");print}' file1 |awk '{gsub("----","    ");print}'
1234 1234 abcd
1234      abcd

For a non-fixed width situation, you can use something like this bellow, that will transform a sequence of more than two spaces in something else:

$ awk '{gsub(" [[:space:]]{2,} "," - ");print}' file
1234 1234 abcd
1234 - abcd

Upvotes: 2

Related Questions