shnboom
shnboom

Reputation: 33

How to write a condition using awk in bash to transpose a file?

I am trying to write code for transposing a given file in bash. suppose the file 'file.txt' is in the following format:

name age
alice 21
ryan 30

we have to transpose the file so that it's in the following format:

name alice ryan
age 21 30

I came across the following solution for this question:

awk '{for(i=0;++i<=NF;)a[i]=a[i]?a[i] FS $i:$i}END{for(i=0;i++<NF;)print a[i]}' file.txt

In the following condition

a[i]=a[i]?a[i] FS $i:$i

what is a[i] referring to? How does this condition lead to the transpose of file?

Upvotes: 2

Views: 161

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133518

Could you please go through following detailed explanation if this helps you.

awk '                               ##Starting awk program from here.
{                                   ##Starting main BLOCK from here.
  for(i=0;++i<=NF;){                ##Starting a for loop whioh starts from i=1 to till value of NF(number of fields) in current line.
    a[i]=a[i]?a[i] FS $i:$i         ##Creating an array a whose index is variable i value and its keep concatenating its own value with same field number.
  }                                 ##Closing BLOCK for, for loop here.
}                                   ##Closing main BLOCK here.
END{                                ##Starting END block for this awk program here.
  for(i=0;i++<NF;){                 ##Starting a for loop which runs from i=0 to value of NF.
    print a[i]                      ##Printing value of array a whose index is variable i.
  }                                 ##Closing BLOCK for, for loop here.
}                                   ##Closing BLOCK for END block of this awk program here.
' Input_file                        ##Mentioning Input_file name here.


My suggested code where fixed few things from OP's code to make it effective code:

awk '                               ##Starting awk program from here.
{                                   ##Starting main BLOCK from here.
  for(i=1;i<=NF;i++){               ##Starting a for loop which starts from i=1 to till value of NF(number of fields) in current line.
    a[i]=(a[i]?a[i] FS:"")$i        ##Creating an array a whose index is variable i value and its keep concatenating its own value with same field number.
  }                                 ##Closing BLOCK for, for loop here.
  nf=NF?NF:nf                       ##Creating a variable named nf whose value is NF, let us NOT directly get NF value in END block it may lead to confusions.
}                                   ##Closing main BLOCK here.
END{                                ##Starting END block for this awk program here.
  for(i=1;i<=nf;i++){               ##Starting a for loop which runs from i=0 to value of NF.
    print a[i]                      ##Printing value of array a whose index is variable i.
  }                                 ##Closing BLOCK for, for loop here.
}                                   ##Closing BLOCK for END block of this awk program here.
' Input_file                        ##Mentioning Input_file name here.

Following are the fixes done in OP's code:

  • Field number in awk for any line starts from 1 NOT from 0 so changed this in for loops.
  • Changed setting of array a to, syntax vice under code beautifying thing.
  • Created a variable named nf which will take value from each line's NF and will take care in case of last line is empty or etc its value will NOT be empty which would have been case in OP's approach(using NF directly in END block of OP's code).


OP's question's answer how transpose is happening:

  • We are traversing through all fields of current lines.(with for loop)
  • We are creating array a whose index is depending upon field numbers, meaning same field number values will have same indexes in array. Hence it is helping us to concatenate them with same field number(keep field number as a key/index for array a)
  • While printing we are simply going through again from value 1 to till value of nf(self created variable).


One more way of understanding:

Let's consider your values(only virtually NOT exact values).

| @
| @
| @

Now what we are doing is, creating an array with same index so till last line values of it will be | | | with index 1 AND @ @ @ with index 2 Now what we are doing in END section is starting a for loop which goes from 1 to till NF.

So 1st all values with index 1 will be printed, then all values with index 2 will be printed and so on.... like this:

| | |
@ @ @

Since we are simply concatenating all COLUMN values into same index array and finally printing them with COLUMN NUMBERS with a for loop hence it becomes a transpose of original data.

Upvotes: 3

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185161

This is ternary operator.

With an if statement, it would be more readable :

if (a[i]) {
    a[i]=a[i] FS $i # concatenation
} else {
    a[i]=$i
}

Upvotes: 0

Related Questions