Reputation: 371
Most of my Perl regex successes were by accident but can't get this one at the moment. I need another accident.
I think I am nearly there with this problem and my effort is as follows:
while ( my $line = <INPUT> ) {
if($line =~ /\s{6,11}(\w{1}\d{1}\w{0,5})\s{0,5}(\d{3})/) {
my @c_no = "$1,$2\n";
foreach (@c_no) {
print OUTPUT $_;
}
if ($line =~ /^(\s{1}THN)(\s{1,3}\d{0,2}.\d)(\s{1,3}\d{0,2}.\d)(\s{1,2}\d{0,1}.\d{4})(\s{1,2}\d{2,4})
(\s{2,3}\d{1,2})(\s{1,6}\d{1,4})(\s{1,2}\d{1,2})(\s{1,5}\d{1,4})(\s{1,4}\d{1,4})
(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})|(^1(P1))/x){
print OUTPUT "$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14\n";
}
}
}
I am trying to read a file, print the first match(first 'if..') to output, look for the second match(second 'if..'), output it or them, and then look further for the first match, and so on.
The above code prints only the first matches.
I can get it to print the second with the code changed, but not both.
I think I am nearly there and it might need just a change in the position of the curly brackets.
The input file looks +- as follows:
1(P1) 3 P.ell 05/0120 W/P068819 0 12.0 98/99 380 380 C03 103 PROCESSED 18/02/16 TIME 19.07.50 KSINA=8
AGE DBH HT VTREE SPH BA TOTAL WS UTIL S A B C D TCAI CTCAI TMAI UCAI CUCAI UMAI SCAI CSCAI SMAI IAGE
1 .0 .2 .0000 979 0 0 0 0 0 0 0 0 0 .0 0 .0 .0 0 .0 .0 0 .0 1.0
2 .0 .9 .0000 979 0 0 0 0 0 0 0 0 0 .0 0 .0 .0 0 .0 .0 0 .0 2.0
3 3.9 2.0 .0007 979 1 1 1 0 0 0 0 0 0 .7 1 .2 .0 0 .0 .0 0 .0 3.0
4 7.1 3.4 .0041 979 4 4 3 1 1 0 0 0 0 3.4 4 1.0 .6 1 .2 .0 0 .0 4.0
5 9.4 4.6 .0102 979 7 10 5 5 5 0 0 0 0 5.9 10 2.0 4.1 5 .9 .0 0 .0 5.0
6 11.3 5.7 .0188 979 10 18 6 12 12 1 0 0 0 8.4 18 3.1 7.5 12 2.0 .0 0 .0 6.0
7 13.0 6.7 .0293 979 13 29 7 22 19 3 0 0 0 10.3 29 4.1 9.7 22 3.1 .0 0 .0 7.0
17%
THN 11.4 6.7 .0230 164 2 4 1 3 2 0 0 0 0
REM 13.4 6.7 .0315 815 12 26 6 20 17 3 0 0 0
8 15.0 7.6 .0453 815 14 37 6 31 21 10 0 0 0 11.2 40 5.0 10.9 33 4.1 .0 0 .0 7.6
9 16.4 8.5 .0607 815 17 49 6 43 23 20 0 0 0 12.5 52 5.8 12.2 45 5.0 .2 0 .0 8.6
10 17.4 9.4 .0771 815 19 63 7 56 24 30 2 0 0 13.4 66 6.6 13.1 58 5.8 1.3 2 .2 9.6
11 18.3 10.3 .0941 815 21 77 7 70 24 41 5 0 0 13.9 80 7.3 13.6 72 6.5 3.0 5 .4 10.6
12 19.0 11.3 .1118 815 23 91 7 84 24 50 10 0 0 14.4 94 7.8 14.1 86 7.2 5.4 10 .8 11.6
13 19.6 12.2 .1299 815 25 106 8 98 24 56 18 0 0 14.7 109 8.4 14.4 100 7.7 8.0 18 1.4 12.6
33%
THN 17.5 12.2 .1044 265 6 28 2 25 8 15 3 0 0
REM 20.6 12.2 .1421 550 18 78 5 73 16 42 15 0 0
14 21.3 13.0 .1636 550 20 90 6 84 16 44 25 0 0 11.8 121 8.6 11.6 112 8.0 10.0 28 2.0 10.4
15 22.0 13.7 .1864 550 21 103 6 97 16 45 36 0 0 12.5 133 8.9 12.3 124 8.3 11.0 39 2.6 11.2
16 22.7 14.5 .2100 550 22 116 6 109 15 46 48 0 0 13.0 146 9.1 12.7 137 8.6 12.0 51 3.2 12.0
17 23.3 15.3 .2345 550 23 129 6 123 15 46 61 0 0 13.5 160 9.4 13.2 150 8.8 12.9 64 3.8 12.8
18 23.9 15.9 .2598 550 25 143 7 136 15 46 74 1 0 13.9 174 9.6 13.6 164 9.1 13.8 78 4.3 13.6
40%
THN 21.6 15.9 .2142 220 8 47 2 45 6 19 20 0 0
REM 25.3 15.9 .2901 330 17 96 4 92 9 28 54 1 0
19 26.0 16.6 .3203 330 17 106 4 101 9 27 63 3 0 10.0 184 9.7 9.8 174 9.1 10.5 88 4.6 11.0
20 26.6 17.3 .3519 330 18 116 5 112 9 27 71 5 0 10.4 194 9.7 10.2 184 9.2 10.6 99 4.9 11.7
21 27.2 18.0 .3849 330 19 127 5 122 9 27 80 8 0 10.9 205 9.8 10.7 194 9.3 11.1 110 5.2 12.4
Expected result:
C03 103
THN 7 11.4 6.7 .0230 164 2 4 1 3 2 0 0 0 0
THN 13 17.5 12.2 .1044 265 6 28 2 25 8 15 3 0 0
THN 18 21.6 15.9 .2142 220 8 47 2 45 6 19 20 0 0
C04 103
THN etc.
.
.
.
.
Or just:
1(P1) 3 P.ell 05/0120 W/P068819 0 12.0 98/99 380 380 C03 103 PROCESSED 21/02/16 TIME 22.16.52 KSINA=8
THN 7 11.4 6.7 .0230 164 2 4 1 3 2 0 0 0 0
THN 13 17.5 12.2 .1044 265 6 28 2 25 8 15 3 0 0
THN 18 21.6 15.9 .2142 220 8 47 2 45 6 19 20 0 0
C04 103
THN etc.
.
.
.
.
I added the second expected result option because my question was not clear enough.
The 'C03 103' line was for the example input text I gave (above).
These two variables will change to 'C04 103' or 'A03a 104' or 'D051a 101' etc.
If the result can just supply the line containing those variables 1(P1) ..... that will be enough.
The data comes in repeated batches with the start line 1(P1).....
See example above.
Many thanks for the help. I am almost there.
My final edit, if this helps:
The full normal input batch is as follows. I cut some out to save space.
Probably not a good idea:
1(P1) 3 P.ell 05/0120 W/P068819 0 12.0 98/99 380 380 C03 104 PROCESSED 21/02/16 TIME 22.16.52 KSINA=8
AGE SPH %THN %INC SV SI MAI20 HTPC VIPC AGE BA DBH HT SPH CIH% CIV% CVD BCON CMAI C0 C0CAL SI20
0 1100 .0 89.0%SPH 2 2 .00 .0 .0 20.00 1 .0 17.3 0 .0 .0 .0 .0 .0000 .000 0% .00
7 815 25.9 .0 2 2 9.90 75.5 47.2 20.00 1 26.6 17.3 330 .0 .0 .0 13.0 .2099 1.005 .000 17.30
13 550 32.5 .0
18 330 40.0 .0
45 0 100.0 .0
0SQ -4 -4 -4 = SI20 17 17 17 PLANTN---104 GREEN MEADOWS MODEL---P.ELLIOTTII MAC MAC SQ 10 SI20 22.90
HTPC 76 76 76 =MAI20 10 10 10 FROM HTPC HTPC 100 MAI20 20.71
VIPC 47 47 47 =MAI20 10 10 10 HTPC/VIPC REGRESSION---P.ELLIOTTII GENERAL 1/83 VIPC 100 MAI20 20.99
MAIDBH 0
INMAI==> 0
0INPUT FOR CALCULATING HTPC & VIPC = HT ---- ----
AGE DBH HT VTREE SPH BA TOTAL WS UTIL S A B C D TCAI CTCAI TMAI UCAI CUCAI UMAI SCAI CSCAI SMAI IAGE
1 .0 .2 .0000 979 0 0 0 0 0 0 0 0 0 .0 0 .0 .0 0 .0 .0 0 .0 1.0
2 .0 .9 .0000 979 0 0 0 0 0 0 0 0 0 .0 0 .0 .0 0 .0 .0 0 .0 2.0
3 3.9 2.0 .0007 979 1 1 1 0 0 0 0 0 0 .7 1 .2 .0 0 .0 .0 0 .0 3.0
4 7.1 3.4 .0041 979 4 4 3 1 1 0 0 0 0 3.4 4 1.0 .6 1 .2 .0 0 .0 4.0
5 9.4 4.6 .0102 979 7 10 5 5 5 0 0 0 0 5.9 10 2.0 4.1 5 .9 .0 0 .0 5.0
6 11.3 5.7 .0188 979 10 18 6 12 12 1 0 0 0 8.4 18 3.1 7.5 12 2.0 .0 0 .0 6.0
7 13.0 6.7 .0293 979 13 29 7 22 19 3 0 0 0 10.3 29 4.1 9.7 22 3.1 .0 0 .0 7.0
17%
THN 11.4 6.7 .0230 164 2 4 1 3 2 0 0 0 0
REM 13.4 6.7 .0315 815 12 26 6 20 17 3 0 0 0
8 15.0 7.6 .0453 815 14 37 6 31 21 10 0 0 0 11.2 40 5.0 10.9 33 4.1 .0 0 .0 7.6
9 16.4 8.5 .0607 815 17 49 6 43 23 20 0 0 0 12.5 52 5.8 12.2 45 5.0 .2 0 .0 8.6
10 17.4 9.4 .0771 815 19 63 7 56 24 30 2 0 0 13.4 66 6.6 13.1 58 5.8 1.3 2 .2 9.6
11 18.3 10.3 .0941 815 21 77 7 70 24 41 5 0 0 13.9 80 7.3 13.6 72 6.5 3.0 5 .4 10.6
12 19.0 11.3 .1118 815 23 91 7 84 24 50 10 0 0 14.4 94 7.8 14.1 86 7.2 5.4 10 .8 11.6
13 19.6 12.2 .1299 815 25 106 8 98 24 56 18 0 0 14.7 109 8.4 14.4 100 7.7 8.0 18 1.4 12.6
33%
THN 17.5 12.2 .1044 265 6 28 2 25 8 15 3 0 0
REM 20.6 12.2 .1421 550 18 78 5 73 16 42 15 0 0
14 21.3 13.0 .1636 550 20 90 6 84 16 44 25 0 0 11.8 121 8.6 11.6 112 8.0 10.0 28 2.0 10.4
15 22.0 13.7 .1864 550 21 103 6 97 16 45 36 0 0 12.5 133 8.9 12.3 124 8.3 11.0 39 2.6 11.2
16 22.7 14.5 .2100 550 22 116 6 109 15 46 48 0 0 13.0 146 9.1 12.7 137 8.6 12.0 51 3.2 12.0
17 23.3 15.3 .2345 550 23 129 6 123 15 46 61 0 0 13.5 160 9.4 13.2 150 8.8 12.9 64 3.8 12.8
18 23.9 15.9 .2598 550 25 143 7 136 15 46 74 1 0 13.9 174 9.6 13.6 164 9.1 13.8 78 4.3 13.6
40%
THN 21.6 15.9 .2142 220 8 47 2 45 6 19 20 0 0
REM 25.3 15.9 .2901 330 17 96 4 92 9 28 54 1 0
19 26.0 16.6 .3203 330 17 106 4 101 9 27 63 3 0 10.0 184 9.7 9.8 174 9.1 10.5 88 4.6 11.0
20 26.6 17.3 .3519 330 18 116 5 112 9 27 71 5 0 10.4 194 9.7 10.2 184 9.2 10.6 99 4.9 11.7
21 27.2 18.0 .3849 330 19 127 5 122 9 27 80 8 0 10.9 205 9.8 10.7 194 9.3 11.1 110 5.2 12.4
22 27.9 18.7 .4192 330 20 138 5 133 8 26 87 11 0 11.3 216 9.8 11.1 206 9.3 11.5 121 5.5 13.2
23 28.4 19.3 .4546 330 21 150 5 145 8 26 94 16 0 11.7 228 9.9 11.4 217 9.4 11.8 133 5.8 14.0
24 29.0 20.0 .4914 330 22 162 5 157 8 26 101 22 0 12.2 240 10.0 11.9 229 9.5 12.3 145 6.1 14.9
25 29.6 20.6 .5292 330 23 175 6 169 8 25 106 29 0 12.5 253 10.1 12.2 241 9.6 12.6 158 6.3 15.7
26 30.2 21.2 .5682 330 24 188 6 182 8 25 112 37 0 12.9 265 10.2 12.6 254 9.8 13.0 171 6.6 16.5
27 30.7 21.8 .6083 330 25 201 6 194 8 25 115 46 0 13.2 279 10.3 13.0 267 9.9 13.3 184 6.8 17.3
28 31.3 22.4 .6492 330 25 214 7 208 8 24 119 56 1 13.5 292 10.4 13.2 280 10.0 13.6 198 7.1 18.2
29 31.9 23.0 .6908 330 26 228 7 221 8 24 122 65 2 13.7 306 10.5 13.5 293 10.1 13.8 212 7.3 19.0
30 32.4 23.5 .7332 330 27 242 7 235 8 24 123 77 3 14.0 320 10.7 13.7 307 10.2 14.0 226 7.5 19.8
31 33.0 23.9 .7766 330 28 256 7 249 8 24 125 88 5 14.3 334 10.8 14.0 321 10.4 14.3 240 7.7 20.4
32 33.6 24.4 .8202 330 29 271 8 263 8 23 126 99 7 14.4 349 10.9 14.1 335 10.5 14.4 255 8.0 21.0
Final, Final Edit: I forgot to state that I need to read off the last age recorded just before the 'THN' line. This age reading refers to the 'THN' line immediately below it. I have modified the expected output to include the age reading after the 'THN' entry.
I had not realized this until now and therefore it did not appear in my attempted code.
Upvotes: 3
Views: 211
Reputation: 53478
Honestly - for what you're trying to do, the answer is probably not a complicated regex.
I mean, based on what you've specified, this will do it:
#!/usr/bin/env perl
use strict;
use warnings;
while ( <> ) {
if ( m/PROCESSED/ ) {
print join " ",m/(\w+)\s+(\w+)\s+PROCESSED/,"\n";
}
print if /^THN/;
}
If your thing is more complicated than that, I'd suggest rather than regexing each line, you can instead:
#!/usr/bin/env perl
use strict;
use warnings;
my @header;
while (<>) {
if ( m/PROCESSED/ ) {
print join " ",m/(\w+)\s+(\w+)\s+PROCESSED/,"\n";
}
if (m/AGE/) {
chomp( @header = split );
}
my %this_row;
@this_row{@header} = split;
if ( $this_row{AGE} and $this_row{AGE} eq "THN" ) {
print join ("\t", @this_row{@header[0..13]}),"\n";
}
}
This makes use of hash slicing, to store each row as a hash, based on that header row. This is useful, because it means you can refer to elements by heading. E.g.
$this_row{AGE};
Overkill if you just want to print as you go though, where the first answer will do.
Based on comments:
Data::Dumper
is a way of printing a data structure (like a hash). Just removing it won't work, because you need to 'handle' printing the hash.THN
in it. But you can still use the first snippet if you use /^THN/
instead. Upvotes: 2
Reputation: 6742
It's not easy to know exactly what you're trying to do, but maybe you need an inner loop.
Something like this
while ( my $line = <INPUT> ) {
if($line =~ /\s{6,11}(\w{1}\d{1}\w{0,5})\s{0,5}(\d{3})/) {
my @c_no = "$1,$2\n";
foreach (@c_no) {
print OUTPUT $_;
}
while ( my $line = <INPUT> ) {
if ($line =~ /^(\s{1}THN)(\s{1,3}\d{0,2}.\d)(\s{1,3}\d{0,2}.\d)(\s{1,2}\d{0,1}.\d{4})(\s{1,2}\d{2,4})
(\s{2,3}\d{1,2})(\s{1,6}\d{1,4})(\s{1,2}\d{1,2})(\s{1,5}\d{1,4})(\s{1,4}\d{1,4})
(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})|(^1(P1))/x){
print OUTPUT "$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14\n";
last;
}
}
}
}
Upvotes: 0