Reputation: 23

Efficiently transpose output from tab / new line delimited file in bash in dynamically changing file

EDIT The number of lines and tab-separate values is also dynamic as it can change. So it might be 1-5 or 1-10 with the same layout but the region will only be listed once.

I have a file with in the following format:(@TSV)

host1   host2   host3
id1 id2 id3
ip1 ip2 ip3
name1   name2   name3
role1   role2   role3
region

I can also format the file like:

host1
host2
host3
id1
id2
id3
ip1
ip2
ip3
name1
name2
name3
role1
role2
role3
region

I would like to write a new file or modify this file inline so the file is in this format: (tsv)

host1   id1 ip1 name1 role1 region
host2   id2 ip2 name2 role2 region
host3   id3 ip3 name3 role3 region

I have tried without success to use awk, sed, for loops... I need some fresh ideas.

Upvotes: 2

Answers (3)

Ed Morton

Reputation: 204731

The idiomatic awk approach to transposing rows to columns:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
    numCols = NR
    numRows = (NF>numRows ? NF : numRows)
    for (rowNr=1; rowNr<=NF; rowNr++) {
        vals[rowNr,numCols] = $rowNr
    }
}
END {
    for (rowNr=1; rowNr<=numRows; rowNr++) {
        for (colNr=1; colNr<=numCols; colNr++) {
            val = ((rowNr,colNr) in vals ? vals[rowNr,colNr] : vals[1,colNr])
            printf "%s%s", val, (colNr<numCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
host1   id1     ip1     name1   role1   region
host2   id2     ip2     name2   role2   region
host3   id3     ip3     name3   role3   region

The above was run on your first input file:

$ cat file
host1   host2   host3
id1     id2     id3
ip1     ip2     ip3
name1   name2   name3
role1   role2   role3
region

Note the script makes no reference to any values in your input, nor how many rows or columns you have nor any other assumptions about the content of your input file except that if values are missing you want the first one repeated.

Upvotes: 1

karakfa

Reputation: 67567

Starting with the list formatted version, if you had no missing data, i.e. "religion" 3 times, it would be much easier.

You can add the missing values on the fly and then simply pr

$ awk '1; END{print;print}' file | pr -6ts

host1   id1     ip1     name1   role1   region
host2   id2     ip2     name2   role2   region
host3   id3     ip3     name3   role3   region

if the number of columns are known and only the last values might be missing, you can parametrize by number of columns

$ cols=6; awk -v cols=$cols '1; END{for(i=1;i<=(NR-cols)/(cols-1);i++) print}' file |
  pr -${cols}ts

Upvotes: 0

hek2mgl

Reputation: 158280

You can use the following awk script:

# translate.awk

NR==1 {
    split($0,hosts)
}
NR==2 {
    split($0,ids)
}
NR==3{
    split($0,ips)
}
NR==4{
    split($0,names)
}
NR==5{
    split($0,roles)
}
NR==6{
    region=$1
}

END{
    OFS="\t"
    for(i in hosts) {
        print hosts[i], ids[i], ips[i], names[i], roles[i], region
    }
}

Call it like this:

awk -f translate.awk input.file

Upvotes: 0

Efficiently transpose output from tab / new line delimited file in bash in dynamically changing file

Answers (3)

Related Questions