Reputation: 91
A file contains several tables. Each table has three columns separated by tabs.
The first column contains the serial number which is not continuous. I want to print missing serial number with second and third column with 0 0.
input file
Pos freq catagory
13 1 4
14 2 3
17 5 2
20 3 2 <<<<<
21 1 4
25 1 4
Pos freq catagory
10 1 4
12 2 3
13 5 2
14 3 2 <<<<<
15 1 4
output file
Pos freq catagory
13 1 4
14 2 3
15 0 0
16 0 0
17 5 2
18 0 0
19 0 0
20 3 2 <<<<<
21 1 4
22 0 0
23 0 0
24 0 0
25 1 4
Pos freq catagory
10 1 4
11 0 0
12 2 3
13 5 2
14 3 2 <<<<<
15 1 4
Could you please help me to do it in Perl, sed, or awk?
Upvotes: 0
Views: 90
Reputation: 126742
This Perl program is unnecessarily long because it tries to use the spacing in the original file in the filled data lines. It also gets things wrong if the first field is fixed length with leading spaces.
Yes, I know It is silly to write software that guesses the proper specification.
use strict;
use warnings;
my $seq;
while (<>) {
if ( / ( (\s*) (\d+) ) /x ) {
if (defined $seq) {
printf "%s%d\t0\t0\n", $2, $_ for $seq+1 .. $3-1;
}
$seq = $3;
}
else {
$seq = undef;
}
print;
}
output
Pos freq catagory
13 1 4
14 2 3
15 0 0
16 0 0
17 5 2
18 0 0
19 0 0
20 3 2 <<<<<
21 1 4
22 0 0
23 0 0
24 0 0
25 1 4
Pos freq catagory
10 1 4
11 0 0
12 2 3
13 5 2
14 3 2 <<<<<
15 1 4
Upvotes: 1
Reputation: 195209
$ awk -F'\t' -v OFS="\t" '!p{p=$1-1}++p<$1{for(p;p<$1;p++)print p,0,0}7' file
13 1 4
14 2 3
15 0 0
16 0 0
17 5 2
18 0 0
19 0 0
20 3 2
21 1 4
22 0 0
23 0 0
24 0 0
25 1 4
Upvotes: 1
Reputation: 65821
Just in case you are interested in a coreutils+sed solution:
$ join -a2 -j1 -t $'\t' input_file <(seq 13 25) | sed 's/^[^[:space:]]*$/&\t0\t0/'
13 1 4
14 2 3
15 0 0
16 0 0
17 5 2
18 0 0
19 0 0
20 3 2
21 1 4
22 0 0
23 0 0
24 0 0
25 1 4
Upvotes: 1