jlb333333
jlb333333

Reputation: 371

Perl regex: More than one line to match and output

Most of my Perl regex successes were by accident but can't get this one at the moment. I need another accident.
I think I am nearly there with this problem and my effort is as follows:

while ( my $line = <INPUT> ) {

                if($line =~ /\s{6,11}(\w{1}\d{1}\w{0,5})\s{0,5}(\d{3})/) {
                my @c_no = "$1,$2\n";   
                foreach (@c_no) {
                print OUTPUT $_;                    
                }
                if ($line =~ /^(\s{1}THN)(\s{1,3}\d{0,2}.\d)(\s{1,3}\d{0,2}.\d)(\s{1,2}\d{0,1}.\d{4})(\s{1,2}\d{2,4})
                            (\s{2,3}\d{1,2})(\s{1,6}\d{1,4})(\s{1,2}\d{1,2})(\s{1,5}\d{1,4})(\s{1,4}\d{1,4})
                            (\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})|(^1(P1))/x){

                print OUTPUT "$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14\n";
                }           
                }
                }

I am trying to read a file, print the first match(first 'if..') to output, look for the second match(second 'if..'), output it or them, and then look further for the first match, and so on.
The above code prints only the first matches.
I can get it to print the second with the code changed, but not both.
I think I am nearly there and it might need just a change in the position of the curly brackets.

The input file looks +- as follows:

1(P1)      3 P.ell    05/0120 W/P068819 0 12.0 98/99  380 380           C03     103          PROCESSED 18/02/16 TIME 19.07.50 KSINA=8

AGE  DBH   HT  VTREE  SPH  BA  TOTAL WS  UTIL    S    A    B    C    D   TCAI   CTCAI  TMAI   UCAI CUCAI  UMAI   SCAI CSCAI  SMAI IAGE

  1   .0   .2  .0000  979   0      0  0     0    0    0    0    0    0     .0     0    .0     .0     0    .0     .0     0    .0  1.0
  2   .0   .9  .0000  979   0      0  0     0    0    0    0    0    0     .0     0    .0     .0     0    .0     .0     0    .0  2.0
  3  3.9  2.0  .0007  979   1      1  1     0    0    0    0    0    0     .7     1    .2     .0     0    .0     .0     0    .0  3.0
  4  7.1  3.4  .0041  979   4      4  3     1    1    0    0    0    0    3.4     4   1.0     .6     1    .2     .0     0    .0  4.0
  5  9.4  4.6  .0102  979   7     10  5     5    5    0    0    0    0    5.9    10   2.0    4.1     5    .9     .0     0    .0  5.0
  6 11.3  5.7  .0188  979  10     18  6    12   12    1    0    0    0    8.4    18   3.1    7.5    12   2.0     .0     0    .0  6.0
  7 13.0  6.7  .0293  979  13     29  7    22   19    3    0    0    0   10.3    29   4.1    9.7    22   3.1     .0     0    .0  7.0
                   17%                                                                                                           
THN 11.4  6.7  .0230  164   2      4  1     3    2    0    0    0    0                                                              
REM 13.4  6.7  .0315  815  12     26  6    20   17    3    0    0    0                                                              

  8 15.0  7.6  .0453  815  14     37  6    31   21   10    0    0    0   11.2    40   5.0   10.9    33   4.1     .0     0    .0  7.6
  9 16.4  8.5  .0607  815  17     49  6    43   23   20    0    0    0   12.5    52   5.8   12.2    45   5.0     .2     0    .0  8.6
 10 17.4  9.4  .0771  815  19     63  7    56   24   30    2    0    0   13.4    66   6.6   13.1    58   5.8    1.3     2    .2  9.6
 11 18.3 10.3  .0941  815  21     77  7    70   24   41    5    0    0   13.9    80   7.3   13.6    72   6.5    3.0     5    .4 10.6
 12 19.0 11.3  .1118  815  23     91  7    84   24   50   10    0    0   14.4    94   7.8   14.1    86   7.2    5.4    10    .8 11.6
 13 19.6 12.2  .1299  815  25    106  8    98   24   56   18    0    0   14.7   109   8.4   14.4   100   7.7    8.0    18   1.4 12.6
                   33%                                                                                                           
THN 17.5 12.2  .1044  265   6     28  2    25    8   15    3    0    0                                                              
REM 20.6 12.2  .1421  550  18     78  5    73   16   42   15    0    0                                                              

 14 21.3 13.0  .1636  550  20     90  6    84   16   44   25    0    0   11.8   121   8.6   11.6   112   8.0   10.0    28   2.0 10.4
 15 22.0 13.7  .1864  550  21    103  6    97   16   45   36    0    0   12.5   133   8.9   12.3   124   8.3   11.0    39   2.6 11.2
 16 22.7 14.5  .2100  550  22    116  6   109   15   46   48    0    0   13.0   146   9.1   12.7   137   8.6   12.0    51   3.2 12.0
 17 23.3 15.3  .2345  550  23    129  6   123   15   46   61    0    0   13.5   160   9.4   13.2   150   8.8   12.9    64   3.8 12.8
 18 23.9 15.9  .2598  550  25    143  7   136   15   46   74    1    0   13.9   174   9.6   13.6   164   9.1   13.8    78   4.3 13.6
                   40%                                                                                                           
THN 21.6 15.9  .2142  220   8     47  2    45    6   19   20    0    0                                                              
REM 25.3 15.9  .2901  330  17     96  4    92    9   28   54    1    0                                                              

 19 26.0 16.6  .3203  330  17    106  4   101    9   27   63    3    0   10.0   184   9.7    9.8   174   9.1   10.5    88   4.6 11.0
 20 26.6 17.3  .3519  330  18    116  5   112    9   27   71    5    0   10.4   194   9.7   10.2   184   9.2   10.6    99   4.9 11.7
 21 27.2 18.0  .3849  330  19    127  5   122    9   27   80    8    0   10.9   205   9.8   10.7   194   9.3   11.1   110   5.2 12.4

Expected result:

 C03     103                                                         
 THN  7 11.4  6.7  .0230  164   2      4  1     3    2    0    0    0    0                                                              
 THN 13 17.5 12.2  .1044  265   6     28  2    25    8   15    3    0    0                                                              
 THN 18 21.6 15.9  .2142  220   8     47  2    45    6   19   20    0    0
 C04 103
 THN etc.
.
.
.
.

Or just:

1(P1)      3 P.ell    05/0120 W/P068819 0 12.0 98/99  380 380           C03     103          PROCESSED 21/02/16 TIME 22.16.52 KSINA=8  
THN  7 11.4  6.7  .0230  164   2      4  1     3    2    0    0    0    0                                                              
THN 13 17.5 12.2  .1044  265   6     28  2    25    8   15    3    0    0                                                              
THN 18 21.6 15.9  .2142  220   8     47  2    45    6   19   20    0    0
C04 103
THN etc.
.
.
.
.

I added the second expected result option because my question was not clear enough.

The 'C03 103' line was for the example input text I gave (above).

These two variables will change to 'C04 103' or 'A03a 104' or 'D051a 101' etc.

If the result can just supply the line containing those variables 1(P1) ..... that will be enough.

The data comes in repeated batches with the start line 1(P1).....

See example above.

Many thanks for the help. I am almost there.

My final edit, if this helps:

The full normal input batch is as follows. I cut some out to save space.

Probably not a good idea:

1(P1)      3 P.ell    05/0120 W/P068819 0 12.0 98/99  380 380           C03     104          PROCESSED 21/02/16 TIME 22.16.52 KSINA=8

AGE   SPH   %THN  %INC     SV SI   MAI20  HTPC  VIPC   AGE BA  DBH   HT  SPH   CIH% CIV%    CVD BCON    CMAI      C0   C0CAL    SI20
0  1100     .0  89.0%SPH  2  2     .00    .0    .0 20.00  1   .0 17.3    0     .0   .0     .0   .0    .0000    .000    0%      .00
7   815   25.9    .0      2  2    9.90  75.5  47.2 20.00  1 26.6 17.3  330     .0   .0     .0 13.0    .2099   1.005   .000   17.30
13   550   32.5    .0                                                                                                              
18   330   40.0    .0                                                                                                              
45     0  100.0    .0                                                                                                              
0SQ     -4  -4  -4  = SI20  17 17 17              PLANTN---104 GREEN MEADOWS       MODEL---P.ELLIOTTII MAC MAC   SQ    10 SI20  22.90
HTPC   76  76  76  =MAI20  10 10 10 FROM HTPC                                                                   HTPC 100 MAI20 20.71
VIPC   47  47  47  =MAI20  10 10 10              HTPC/VIPC REGRESSION---P.ELLIOTTII GENERAL 1/83                VIPC 100 MAI20 20.99
                 MAIDBH     0                                                                                                    
                 INMAI==>   0                                                                                                    
0INPUT FOR CALCULATING HTPC & VIPC =  HT  ---- ----                                                                                  

AGE  DBH   HT  VTREE  SPH  BA  TOTAL WS  UTIL    S    A    B    C    D   TCAI CTCAI  TMAI   UCAI CUCAI  UMAI   SCAI CSCAI  SMAI IAGE

1   .0   .2  .0000  979   0      0  0     0    0    0    0    0    0     .0     0    .0     .0     0    .0     .0     0    .0  1.0
2   .0   .9  .0000  979   0      0  0     0    0    0    0    0    0     .0     0    .0     .0     0    .0     .0     0    .0  2.0
3  3.9  2.0  .0007  979   1      1  1     0    0    0    0    0    0     .7     1    .2     .0     0    .0     .0     0    .0  3.0
4  7.1  3.4  .0041  979   4      4  3     1    1    0    0    0    0    3.4     4   1.0     .6     1    .2     .0     0    .0  4.0
5  9.4  4.6  .0102  979   7     10  5     5    5    0    0    0    0    5.9    10   2.0    4.1     5    .9     .0     0    .0  5.0
6 11.3  5.7  .0188  979  10     18  6    12   12    1    0    0    0    8.4    18   3.1    7.5    12   2.0     .0     0    .0  6.0
7 13.0  6.7  .0293  979  13     29  7    22   19    3    0    0    0   10.3    29   4.1    9.7    22   3.1     .0     0    .0  7.0
                   17%                                                                                                           
THN 11.4  6.7  .0230  164   2      4  1     3    2    0    0    0    0                                                              
REM 13.4  6.7  .0315  815  12     26  6    20   17    3    0    0    0                                                              

 8 15.0  7.6  .0453  815  14     37  6    31   21   10    0    0    0   11.2    40   5.0   10.9    33   4.1     .0     0    .0  7.6
 9 16.4  8.5  .0607  815  17     49  6    43   23   20    0    0    0   12.5    52   5.8   12.2    45   5.0     .2     0    .0  8.6
10 17.4  9.4  .0771  815  19     63  7    56   24   30    2    0    0   13.4    66   6.6   13.1    58   5.8    1.3     2    .2  9.6
11 18.3 10.3  .0941  815  21     77  7    70   24   41    5    0    0   13.9    80   7.3   13.6    72   6.5    3.0     5    .4 10.6
12 19.0 11.3  .1118  815  23     91  7    84   24   50   10    0    0   14.4    94   7.8   14.1    86   7.2    5.4    10    .8 11.6
13 19.6 12.2  .1299  815  25    106  8    98   24   56   18    0    0   14.7   109   8.4   14.4   100   7.7    8.0    18   1.4 12.6
                   33%                                                                                                           
THN 17.5 12.2  .1044  265   6     28  2    25    8   15    3    0    0                                                              
REM 20.6 12.2  .1421  550  18     78  5    73   16   42   15    0    0                                                              

14 21.3 13.0  .1636  550  20     90  6    84   16   44   25    0    0   11.8   121   8.6   11.6   112   8.0   10.0    28   2.0 10.4
15 22.0 13.7  .1864  550  21    103  6    97   16   45   36    0    0   12.5   133   8.9   12.3   124   8.3   11.0    39   2.6 11.2
16 22.7 14.5  .2100  550  22    116  6   109   15   46   48    0    0   13.0   146   9.1   12.7   137   8.6   12.0    51   3.2 12.0
17 23.3 15.3  .2345  550  23    129  6   123   15   46   61    0    0   13.5   160   9.4   13.2   150   8.8   12.9    64   3.8 12.8
18 23.9 15.9  .2598  550  25    143  7   136   15   46   74    1    0   13.9   174   9.6   13.6   164   9.1   13.8    78   4.3 13.6
                   40%                                                                                                           
THN 21.6 15.9  .2142  220   8     47  2    45    6   19   20    0    0                                                              
REM 25.3 15.9  .2901  330  17     96  4    92    9   28   54    1    0                                                              

19 26.0 16.6  .3203  330  17    106  4   101    9   27   63    3    0   10.0   184   9.7    9.8   174   9.1   10.5    88   4.6 11.0
20 26.6 17.3  .3519  330  18    116  5   112    9   27   71    5    0   10.4   194   9.7   10.2   184   9.2   10.6    99   4.9 11.7
21 27.2 18.0  .3849  330  19    127  5   122    9   27   80    8    0   10.9   205   9.8   10.7   194   9.3   11.1   110   5.2 12.4
22 27.9 18.7  .4192  330  20    138  5   133    8   26   87   11    0   11.3   216   9.8   11.1   206   9.3   11.5   121   5.5 13.2
23 28.4 19.3  .4546  330  21    150  5   145    8   26   94   16    0   11.7   228   9.9   11.4   217   9.4   11.8   133   5.8 14.0
24 29.0 20.0  .4914  330  22    162  5   157    8   26  101   22    0   12.2   240  10.0   11.9   229   9.5   12.3   145   6.1 14.9
25 29.6 20.6  .5292  330  23    175  6   169    8   25  106   29    0   12.5   253  10.1   12.2   241   9.6   12.6   158   6.3 15.7
26 30.2 21.2  .5682  330  24    188  6   182    8   25  112   37    0   12.9   265  10.2   12.6   254   9.8   13.0   171   6.6 16.5
27 30.7 21.8  .6083  330  25    201  6   194    8   25  115   46    0   13.2   279  10.3   13.0   267   9.9   13.3   184   6.8 17.3
28 31.3 22.4  .6492  330  25    214  7   208    8   24  119   56    1   13.5   292  10.4   13.2   280  10.0   13.6   198   7.1 18.2
29 31.9 23.0  .6908  330  26    228  7   221    8   24  122   65    2   13.7   306  10.5   13.5   293  10.1   13.8   212   7.3 19.0
30 32.4 23.5  .7332  330  27    242  7   235    8   24  123   77    3   14.0   320  10.7   13.7   307  10.2   14.0   226   7.5 19.8
31 33.0 23.9  .7766  330  28    256  7   249    8   24  125   88    5   14.3   334  10.8   14.0   321  10.4   14.3   240   7.7 20.4
32 33.6 24.4  .8202  330  29    271  8   263    8   23  126   99    7   14.4   349  10.9   14.1   335  10.5   14.4   255   8.0 21.0

Final, Final Edit: I forgot to state that I need to read off the last age recorded just before the 'THN' line. This age reading refers to the 'THN' line immediately below it. I have modified the expected output to include the age reading after the 'THN' entry.

I had not realized this until now and therefore it did not appear in my attempted code.

Upvotes: 3

Views: 211

Answers (2)

Sobrique
Sobrique

Reputation: 53478

Honestly - for what you're trying to do, the answer is probably not a complicated regex.

I mean, based on what you've specified, this will do it:

#!/usr/bin/env perl

use strict;
use warnings;

while ( <> ) {
    if ( m/PROCESSED/ ) { 
        print join " ",m/(\w+)\s+(\w+)\s+PROCESSED/,"\n";
    }
    print if /^THN/;
}

If your thing is more complicated than that, I'd suggest rather than regexing each line, you can instead:

#!/usr/bin/env perl

use strict;
use warnings;

my @header;
while (<>) {
   if ( m/PROCESSED/ ) { 
        print join " ",m/(\w+)\s+(\w+)\s+PROCESSED/,"\n";
    }
    if (m/AGE/) {
      chomp( @header = split );
   }
   my %this_row;
   @this_row{@header} = split;
   if ( $this_row{AGE} and $this_row{AGE} eq "THN" ) { 
        print join ("\t", @this_row{@header[0..13]}),"\n";
   }
}

This makes use of hash slicing, to store each row as a hash, based on that header row. This is useful, because it means you can refer to elements by heading. E.g.

$this_row{AGE};

Overkill if you just want to print as you go though, where the first answer will do.

Based on comments:

  • Data::Dumper is a way of printing a data structure (like a hash). Just removing it won't work, because you need to 'handle' printing the hash.
  • You've got a line with THN in it. But you can still use the first snippet if you use /^THN/ instead.

Upvotes: 2

Chris Lear
Chris Lear

Reputation: 6742

It's not easy to know exactly what you're trying to do, but maybe you need an inner loop.

Something like this

while ( my $line = <INPUT> ) {

    if($line =~ /\s{6,11}(\w{1}\d{1}\w{0,5})\s{0,5}(\d{3})/) {
        my @c_no = "$1,$2\n";
        foreach (@c_no) {
            print OUTPUT $_;
        }
        while ( my $line = <INPUT> ) {
            if ($line =~ /^(\s{1}THN)(\s{1,3}\d{0,2}.\d)(\s{1,3}\d{0,2}.\d)(\s{1,2}\d{0,1}.\d{4})(\s{1,2}\d{2,4})
                (\s{2,3}\d{1,2})(\s{1,6}\d{1,4})(\s{1,2}\d{1,2})(\s{1,5}\d{1,4})(\s{1,4}\d{1,4})
                (\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})(\s{1,4}\d{1,4})|(^1(P1))/x){

                print OUTPUT "$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14\n";
                last;
            }
        }
    }
}

Upvotes: 0

Related Questions