firoz
firoz

Reputation: 91

Generating missing values in a discontinuous table

A file contains several tables. Each table has three columns separated by tabs.

The first column contains the serial number which is not continuous. I want to print missing serial number with second and third column with 0 0.

input file

Pos freq catagory
    13    1       4
    14    2       3
    17    5       2
    20    3       2 <<<<<
    21    1       4
    25    1       4
Pos freq catagory
    10    1       4
    12    2       3
    13    5       2
    14    3       2 <<<<<
    15    1       4

output file

Pos freq catagory
    13    1       4
    14    2       3
    15    0       0
    16    0       0
    17    5       2
    18    0       0
    19    0       0
    20    3       2 <<<<<
    21    1       4
    22    0       0
    23    0       0
    24    0       0
    25    1       4
Pos freq catagory
    10    1       4
    11 0 0 
    12    2       3
    13    5       2
    14    3       2 <<<<<
    15 1 4

Could you please help me to do it in Perl, sed, or awk?

Upvotes: 0

Views: 90

Answers (3)

Borodin
Borodin

Reputation: 126742

This Perl program is unnecessarily long because it tries to use the spacing in the original file in the filled data lines. It also gets things wrong if the first field is fixed length with leading spaces.

Yes, I know It is silly to write software that guesses the proper specification.

use strict;
use warnings;

my $seq;

while (<>) {
  if ( / ( (\s*) (\d+) ) /x ) {
    if (defined $seq) {
      printf "%s%d\t0\t0\n", $2, $_ for $seq+1 .. $3-1;
    }
    $seq = $3;
  }
  else {
    $seq = undef;
  }
  print;
}

output

Pos freq catagory
    13    1       4
    14    2       3
    15  0 0
    16  0 0
    17    5       2
    18  0 0
    19  0 0
    20    3       2 <<<<<
    21    1       4
    22  0 0
    23  0 0
    24  0 0
    25    1       4
Pos freq catagory
    10    1       4
    11  0 0
    12    2       3
    13    5       2
    14    3       2 <<<<<
    15    1       4

Upvotes: 1

Kent
Kent

Reputation: 195209

$ awk -F'\t' -v OFS="\t" '!p{p=$1-1}++p<$1{for(p;p<$1;p++)print p,0,0}7' file 
13  1   4
14  2   3
15  0   0
16  0   0
17  5   2
18  0   0
19  0   0
20  3   2
21  1   4
22  0   0
23  0   0
24  0   0
25  1   4

Upvotes: 1

Lev Levitsky
Lev Levitsky

Reputation: 65821

Just in case you are interested in a coreutils+sed solution:

$ join -a2 -j1 -t $'\t' input_file <(seq 13 25) | sed 's/^[^[:space:]]*$/&\t0\t0/'
13      1       4
14      2       3
15      0       0
16      0       0
17      5       2
18      0       0
19      0       0
20      3       2
21      1       4
22      0       0
23      0       0
24      0       0
25      1       4

Upvotes: 1

Related Questions