Santosh Pillai
Santosh Pillai

Reputation: 1391

Transpose using AWK or Perl

Hi I would like to use AWK or Perl to get an output file in the format below. My input file is a space separated text file. This is similar to an earlier question of mine, but in this case the input and output has no formatting. My column positions may change so would appreciate a technique which does not reference column number

Input File

id quantity colour shape size colour shape size colour shape size
1 10 blue square 10 red triangle 12 pink circle 20
2 12 yellow pentagon 3 orange rectangle 4 purple oval 6

Desired Output

id colour shape size
1 blue square 10
1 red triangle 12
1 pink circle 20
2 yellow pentagon 3
2 orange rectangle 4
2 purple oval 6

I am using this code by Dennis Williamson. Only problem is the output I get has no space separation in the transposed fields. I require one space separation

#!/usr/bin/awk -f
BEGIN {
col_list = "quantity colour shape"
# Use a B ("blank") to add spaces in the output before or
# after a format string (e.g. %6dB), but generally use the numeric argument

# columns to be repeated on multiple lines may appear anywhere in
# the input, but they will be output together at the beginning of the line
repeat_fields["id"]
# since these are individually set we won't use B
repeat_fmt["id"] = "%-1s "
# additional fields to repeat on each line

ncols = split(col_list, cols)

for (i = 1; i <= ncols; i++) {
    col_names[cols[i]]
    forms[cols[i]] = "%-1s"
}
}


# save the positions of the columns using the header line
FNR == 1 {
for (i = 1; i <= NF; i++) {
    if ($i in repeat_fields) {
        repeat[++nrepeats] = i
        repeat_look[i] = i
        rformats[i] = repeat_fmt[$i]
    }
    if ($i in col_names) {
        col_nums[++n] = i
        col_look[i] = i
        formats[i] = forms[$i]
    }
}
# print the header line
for (i = 1; i <= nrepeats; i++) {
    f = rformats[repeat[i]]
    sub("d", "s", f)
    gsub("B", " ", f)
    printf f, $repeat[i]
}
for (i = 1; i <= ncols; i++) {
    f = formats[col_nums[i]]
    sub("d", "s", f)
    gsub("B", " ", f)
    printf f, $col_nums[i]
}
printf "\n"
next
}

{
for (i = 1; i <= NF; i++) {
    if (i in repeat_look) {
        f = rformats[i]
        gsub("B", " ", f)
        repeat_out = repeat_out sprintf(f, $i)

    }
    if (i in col_look) {
        f = formats[i]
        gsub("B", " ", f)
        out = out sprintf(f, $i)
        coln++
    }
    if (coln == ncols) {
        print repeat_out out
        out = ""
        coln = 0
    }
}
repeat_out = ""
}

Output

id quantitycolourshape
1 10bluesquare
1 redtrianglepink
2 circle12yellow
2 pentagonorangerectangle

My apologies for not including all info about the actual file earlier. I did this only for simplicity, but it did not capture all my requirements.

In my actual file I am looking to transpose fields n_cell and n_bsc for NODE SITE CHILD

NODE SITE CHILD n_cell n_bsc

Here is a link to the actual file I am working on

Upvotes: 1

Views: 582

Answers (2)

Borodin
Borodin

Reputation: 126772

You have told us that your real data consists of over 5,000 columns and that its column positions may change and I'm afraid that really isn't enough.

So in the absence of any proper information I have written this, which uses the header line to calculate the number and size of the sets of data, where the id column is, and in which column the first set starts.

It works fine on your example data, but I can only guess whether it will work on your live file.

use strict;
use warnings;

my @headers = split ' ', <>;

my %headers;
$headers{$_}++ for @headers;

die "Expected exactly one 'id' column" unless $headers{id} // 0 == 1;
my $id_index = 0;
$id_index++ while $headers[$id_index] ne 'id';

my @labels = grep $headers{$_} > 1, keys %headers;
my $set_size = @labels;
my $num_sets = $headers{$labels[0]};

my $start_index = 0;
$start_index++ while $headers[$start_index] ne $labels[0];

my @reformat;

while (<>) {
  my @fields = split;
  next unless @fields;
  my $id = $fields[$id_index];
  for (my $i = $start_index; $i < @fields; $i+=$set_size) {
    push @reformat, [ $id, @fields[$i..$i + $set_size - 1] ];
  }
}

unshift @labels, 'id';
print "@labels\n";
print "@$_\n" for @reformat;

output

id colour shape size
1 blue square 10
1 red triangle 12
1 pink circle 20
2 yellow pentagon 3
2 orange rectangle 4
2 purple oval 6

Upvotes: 0

ikegami
ikegami

Reputation: 386706

<>;
print("id colour shape size\n");

while (<>) {
   my @combined_fields = split;
   my $id = shift(@combined_fields);
   while (@combined_fields) {
       my @fields = ( $id, splice(@combined_fields, 0, 3) );
       print(join(' ', @fields), "\n");
   }
}

Upvotes: 3

Related Questions