Why AWK print is strange when I set FS = " " instead of FS = "\t"?

Look at the following data file(cou.data) which has four fields separated by tab.

Four fields are:

As for country name or continent name which has two words, two words are separated by space.

(Data are not accurately confirmed, just for test purpose)

USSR    8649    275 Asia
Cananda 3852    25  North America
China   3705    1032    Asia
USA 3615    237 North America
Brazil  3286    134 South America
India   1267    746 Asia
Mexico  762 78  North America
France  211 55  Europe
Japan   144 120 Asia
Germany 96  61  Europe
England 94  56  Europe
Taiwan  55  144 Asia
North Korea 44  2134    Asia

awk 'BEGIN { FS = "\t" } { print $1, "---", $4 }' cou.data

I got the output which exactly meets my anticipation:

USSR --- Asia
Cananda --- North America
China --- Asia
USA --- North America
Brazil --- South America
India --- Asia
Mexico --- North America
France --- Europe
Japan --- Asia
Germany --- Europe
England --- Europe
Taiwan --- Asia
North Korea --- Asia

Then I replace \t by one space (" ") That is :

awk 'BEGIN { FS = " " } { print $1, "---", $4 }' cou.data

The output I got is not understandable to me

USSR --- Asia
Cananda --- North
China --- Asia
USA --- North
Brazil --- South
India --- Asia
Mexico --- North
France --- Europe
Japan --- Asia
Germany --- Europe
England --- Europe
Taiwan --- Asia
North --- 2134

Line 2,4,5,7,13 each have one space and the other lines have no space(s) at all. As for lines that have no space, why $1, $4 still can be printed ?

As for line 2,4,5,7,13, I thought $1 should be printed like this:

    Cananda 3852    25  North 

    USA 3615    237 North 
    Brazil  3286    134 South 

    Mexico  762 78  North 

    North 

And $4 does not exist.

Where did I get wrong ?

Upvotes: 0

Views: 52

Answers (1)

RavinderSingh13
RavinderSingh13

Reputation: 133428

So problem here is string/country names on 1st field which are having spaces in their names for example North Korea. So when you are setting FS as \t this string will be considered as a single field on the other hand when you will set FS as space this will be considered as 2 different fields. That is why you are seeing difference between field numbers after changing the FS values in your codes.

I would suggest your first attempt is good enough to get your expected values.

Upvotes: 1

Related Questions