sohail
sohail

Reputation: 19

Understanding Split function in awk(Shell script):

RequestID       CustomerID      Status
101     101111  Error
102     323232  Success
103     33434   Error

So, I'm trying to print out the first field and second filed using split option. The delimiter is tab above. I know there are various other methods but I'm trying to learn split function in awk. I'm trying the below code:

awk '{split($1,a,"\t");split($2,b,"\t");print a[1], b[2]}' data

The above code prints only the first column($1) not the column($2). Any specific reason why ?

Thanks,

Upvotes: 0

Views: 3947

Answers (4)

Ed Morton
Ed Morton

Reputation: 204035

split takes 3 arguments:

  1. mandatory: the string to be split
  2. mandatory: the array to populate with the sub-strings that result from splitting the original string
  3. optional: the regular expression to use when splitting the string, FS if absent.

Given that it should be obvious that your code should be:

awk '{split($0,a,/\t/); print a[1], a[2]}' data

Note that the 3rd arg to split() is an RE and so you should NOT do either of these things suggested elsethread:

awk '{split($0,a,"\t")...
awk '{split($0,a,FS)...

"\t" is wrong because that is a constant string not a constant RE (/\t/)and so requires awk to parse it twice which leads to complications when escaping characters.

FS is wrong because that's just redundantly specifying the default that you'd get from split($0,a).

Upvotes: 1

BMW
BMW

Reputation: 45293

in awk, the default field separators is whitespace, here is whitespace definition:

Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces.

So in your code, when you use $1 and $2, you already split the line with default field separator (whitespace). If you need try the split function, you need target on $0 (the whole line), others have provide the solution, I needn't write again.

One tip in your case, use FS as fieldsep in split function, so you needn't care of if there is space, several spaces, tab or other mixed whitespace, such as:

awk '{split($0,a,FS); print a[1],a[2]}' file

Upvotes: 0

jaypal singh
jaypal singh

Reputation: 77145

This is how the split function works:

$ cat file
RequestID       CustomerID      Status
101     101111  Error
102     323232  Success
103     33433   Error

$ awk '{split($0,a,"\t"); print a[1],a[2]}' file
RequestID CustomerID
101 101111
102 323232
103 33433

Function takes string (which in your case should be your entire line, i.e $0) followed by an array name, in this case a. Lastly the delimiter which by default is space if not provided (in your case a "\t").

Upvotes: 1

kojiro
kojiro

Reputation: 77137

It is printing a[1], which is the entire first field, and b[2], which is empty, because you're splitting the entire second field, for example, '101111' on tabs, which will be an array with one element.

Unless you change the field separator, awk will split input rows into fields on whitespace, so splitting on tabs is redundant. You could just print $1, $2. If you really want to see the split function in operation, try something other than whitespace:

awk '{split($1, a, "0"); print a[1], a[2];}' < input
1 1
1 2
1 3

Upvotes: 1

Related Questions