Reputation: 91

Generating missing values in a discontinuous table

A file contains several tables. Each table has three columns separated by tabs.

The first column contains the serial number which is not continuous. I want to print missing serial number with second and third column with 0 0.

input file

Pos freq catagory
    13    1       4
    14    2       3
    17    5       2
    20    3       2 <<<<<
    21    1       4
    25    1       4
Pos freq catagory
    10    1       4
    12    2       3
    13    5       2
    14    3       2 <<<<<
    15    1       4

output file

Pos freq catagory
    13    1       4
    14    2       3
    15    0       0
    16    0       0
    17    5       2
    18    0       0
    19    0       0
    20    3       2 <<<<<
    21    1       4
    22    0       0
    23    0       0
    24    0       0
    25    1       4
Pos freq catagory
    10    1       4
    11 0 0 
    12    2       3
    13    5       2
    14    3       2 <<<<<
    15 1 4

Could you please help me to do it in Perl, sed, or awk?

Upvotes: 0

Answers (3)

Borodin

Reputation: 126742

This Perl program is unnecessarily long because it tries to use the spacing in the original file in the filled data lines. It also gets things wrong if the first field is fixed length with leading spaces.

Yes, I know It is silly to write software that guesses the proper specification.

use strict;
use warnings;

my $seq;

while (<>) {
  if ( / ( (\s*) (\d+) ) /x ) {
    if (defined $seq) {
      printf "%s%d\t0\t0\n", $2, $_ for $seq+1 .. $3-1;
    }
    $seq = $3;
  }
  else {
    $seq = undef;
  }
  print;
}

output

Pos freq catagory
    13    1       4
    14    2       3
    15  0 0
    16  0 0
    17    5       2
    18  0 0
    19  0 0
    20    3       2 <<<<<
    21    1       4
    22  0 0
    23  0 0
    24  0 0
    25    1       4
Pos freq catagory
    10    1       4
    11  0 0
    12    2       3
    13    5       2
    14    3       2 <<<<<
    15    1       4

Upvotes: 1

Kent

Reputation: 195209

$ awk -F'\t' -v OFS="\t" '!p{p=$1-1}++p<$1{for(p;p<$1;p++)print p,0,0}7' file 
13  1   4
14  2   3
15  0   0
16  0   0
17  5   2
18  0   0
19  0   0
20  3   2
21  1   4
22  0   0
23  0   0
24  0   0
25  1   4

Upvotes: 1

Lev Levitsky

Reputation: 65821

Just in case you are interested in a coreutils+sed solution:

$ join -a2 -j1 -t $'\t' input_file <(seq 13 25) | sed 's/^[^[:space:]]*$/&\t0\t0/'
13      1       4
14      2       3
15      0       0
16      0       0
17      5       2
18      0       0
19      0       0
20      3       2
21      1       4
22      0       0
23      0       0
24      0       0
25      1       4

Upvotes: 1

Generating missing values in a discontinuous table

Answers (3)

Related Questions