Reputation: 1391
Hi I would like to use AWK or Perl to get an output file in the format below. My input file is a space separated text file. This is similar to an earlier question of mine, but in this case the input and output has no formatting. My column positions may change so would appreciate a technique which does not reference column number
Input File
id quantity colour shape size colour shape size colour shape size
1 10 blue square 10 red triangle 12 pink circle 20
2 12 yellow pentagon 3 orange rectangle 4 purple oval 6
Desired Output
id colour shape size
1 blue square 10
1 red triangle 12
1 pink circle 20
2 yellow pentagon 3
2 orange rectangle 4
2 purple oval 6
I am using this code by Dennis Williamson. Only problem is the output I get has no space separation in the transposed fields. I require one space separation
#!/usr/bin/awk -f
BEGIN {
col_list = "quantity colour shape"
# Use a B ("blank") to add spaces in the output before or
# after a format string (e.g. %6dB), but generally use the numeric argument
# columns to be repeated on multiple lines may appear anywhere in
# the input, but they will be output together at the beginning of the line
repeat_fields["id"]
# since these are individually set we won't use B
repeat_fmt["id"] = "%-1s "
# additional fields to repeat on each line
ncols = split(col_list, cols)
for (i = 1; i <= ncols; i++) {
col_names[cols[i]]
forms[cols[i]] = "%-1s"
}
}
# save the positions of the columns using the header line
FNR == 1 {
for (i = 1; i <= NF; i++) {
if ($i in repeat_fields) {
repeat[++nrepeats] = i
repeat_look[i] = i
rformats[i] = repeat_fmt[$i]
}
if ($i in col_names) {
col_nums[++n] = i
col_look[i] = i
formats[i] = forms[$i]
}
}
# print the header line
for (i = 1; i <= nrepeats; i++) {
f = rformats[repeat[i]]
sub("d", "s", f)
gsub("B", " ", f)
printf f, $repeat[i]
}
for (i = 1; i <= ncols; i++) {
f = formats[col_nums[i]]
sub("d", "s", f)
gsub("B", " ", f)
printf f, $col_nums[i]
}
printf "\n"
next
}
{
for (i = 1; i <= NF; i++) {
if (i in repeat_look) {
f = rformats[i]
gsub("B", " ", f)
repeat_out = repeat_out sprintf(f, $i)
}
if (i in col_look) {
f = formats[i]
gsub("B", " ", f)
out = out sprintf(f, $i)
coln++
}
if (coln == ncols) {
print repeat_out out
out = ""
coln = 0
}
}
repeat_out = ""
}
Output
id quantitycolourshape
1 10bluesquare
1 redtrianglepink
2 circle12yellow
2 pentagonorangerectangle
My apologies for not including all info about the actual file earlier. I did this only for simplicity, but it did not capture all my requirements.
In my actual file I am looking to transpose fields n_cell and n_bsc for NODE SITE CHILD
NODE SITE CHILD n_cell n_bsc
Here is a link to the actual file I am working on
Upvotes: 1
Views: 582
Reputation: 126772
You have told us that your real data consists of over 5,000 columns and that its column positions may change and I'm afraid that really isn't enough.
So in the absence of any proper information I have written this, which uses the header line to calculate the number and size of the sets of data, where the id
column is, and in which column the first set starts.
It works fine on your example data, but I can only guess whether it will work on your live file.
use strict;
use warnings;
my @headers = split ' ', <>;
my %headers;
$headers{$_}++ for @headers;
die "Expected exactly one 'id' column" unless $headers{id} // 0 == 1;
my $id_index = 0;
$id_index++ while $headers[$id_index] ne 'id';
my @labels = grep $headers{$_} > 1, keys %headers;
my $set_size = @labels;
my $num_sets = $headers{$labels[0]};
my $start_index = 0;
$start_index++ while $headers[$start_index] ne $labels[0];
my @reformat;
while (<>) {
my @fields = split;
next unless @fields;
my $id = $fields[$id_index];
for (my $i = $start_index; $i < @fields; $i+=$set_size) {
push @reformat, [ $id, @fields[$i..$i + $set_size - 1] ];
}
}
unshift @labels, 'id';
print "@labels\n";
print "@$_\n" for @reformat;
output
id colour shape size
1 blue square 10
1 red triangle 12
1 pink circle 20
2 yellow pentagon 3
2 orange rectangle 4
2 purple oval 6
Upvotes: 0
Reputation: 386706
<>;
print("id colour shape size\n");
while (<>) {
my @combined_fields = split;
my $id = shift(@combined_fields);
while (@combined_fields) {
my @fields = ( $id, splice(@combined_fields, 0, 3) );
print(join(' ', @fields), "\n");
}
}
Upvotes: 3