Find line and column number in the text file using perl

Question

I am trying to get the line and column number when the string exact matches in the file. Then I can able to get the line number and not column number.

Need to find \amp in the below string:

$str = '\begin{document}
    	itle{Testing}
    It is important that the final model or models should make sense
    physically: at a minimum, this usually means that interactions should
    not be included without main effects nor higher-degree polynomial
    terms without their lower-degree relatives. Furthermore, if the model
    is to be used as a summary of the findings of one out of several
    studies bearing on the same phenomenon, main effects would usually be
    included whether significant or not.

    \begin{align}\label{equilibrium-disp-cyl}
    &G\left( {{
abla ^{2}}{u_{r}} - \frac{2}{{{r^{2}}}}\frac{{\partial
{u_{	heta} }}}{{\partial 	heta }} - \frac{{{u_{r}}}}{{{r^{2}}}}}

ight) 
onumber\
\frac{1}{r}\frac{{\partial {u_{	heta} \amp}}}{{\partial 	heta }} +
)\frac{1}{r}\frac{\partial }{{\partial 	heta }}\left(
{\frac{{\partial {u_{r}}}}{{\partial r}} + \frac{{{u_{r}}}}{r} +
&G{
abla ^{2}}{u_{z}} + ( {\lambda + G} )\frac{\partial }{{\partial
\end{align}
some para text continues....
    \begin{align}\label{equilibrium-disp-cyl}
&G\left( {{
abla ^{2}}{u_{r}} - \frac{2}{{{r^{2}}}}\frac{{\partial
{u_{	heta} }}}{{\partial 	heta }} - \frac{{{u_{r}}}}{{{r^{2}}}}}

ight) 
onumber\
\frac{1}{r}\frac{{\partial {u_{	heta}}}}{{\partial 	heta }} +
)\frac{1}{r}\frac{\partial }{{\partial 	heta }}\left(
{\frac{{\partial {u_{r}}}}{{\partial r}} + \frac{{{u_{r}}}}{r} +
&G{
abla ^{2}}{u_{z}} + ( {\lambda + G} \amp )\frac{\partial }{{\partial
\end{align}
some para text continues....
    \begin{align}\label{equilibrium-disp-cyl}
&G\left( {{
abla ^{2}}{u_{r}} - \frac{2}{{{r^{2}}}}\frac{{\partial
{u_{	heta} }}}{{\partial 	heta }} - \amp \frac{{{u_{r}}}}{{{r^{2}}}}}

ight) 
onumber\
\frac{1}{r}\frac{{\partial {u_{	heta}}}}{{\partial 	heta }} +
)\frac{1}{r}\frac{\partial }{{\partial 	heta }}\left(
{\frac{{\partial {u_{r}}}}{{\partial r}} + \frac{{{u_{r}}}}{r} +
&G{
abla ^{2}}{u_{z}} + ( {\lambda + G} \amp )\frac{\partial }{{\partial
\end{align}
';

My Code:

my $_pres = ();
while($str=~m/\begin\{align\}((?:(?!\end\{align\}).)*)\end\{align\}/sg)
{
    $_pres = $`; my $nolabel = $&;
    if($nolabel=~m/\amp/i)
    {
        my $nwpre = $`; $newpre = $_pres.$nwpre;

        my ($line) = ($newpre =~s/
/
/g)+1;
        print "L: $line - Found amp...!!!
";
    }
}

Output:

 L: 8 - Found amp...!!!
 L: 21 - Found amp...!!!
 L: 26 - Found amp...!!!

Expected output:

 L: 7:nn - \amp command found ...!!!

Could someone please guide me to get the column number and it would be appreciated also.

zdim · Accepted Answer

I take it that the \begin\{align\} and \end\{align\} patterns are there to locate such passages (Latex's align environment) in a larger body of text.

Once you got that, break the rest into lines and finding \amp's location is then easy

use warnings;
use strict;

# ADDED another "\amp", to the line before last
my $str = '\begin{align}\label{equilibrium-disp-cyl}  
    ... [ suppressed for brevity ]    

ight) = 0, \amp
\end{align}
';

while ($str =~ m/\begin\{align\} (.*?) \end\{align\}/sgx)
{
    my @lines = split /
/, $1; 
    for my $i (0..$#lines)
    {
        my $line = $lines[$i];

        if ($line =~ /(\amp)/i) 
        {
            print  "Found '$1' -- ";
            printf "Line number: %3d, match start: %2d, match end: %2d
",
                $i+1, $-[0], $+[0];
        }
    }
}

This uses @- (@LAST_MATCH_START) and @+ (@LAST_MATCH_END) arrays, which give offsets of the start and end of last successful submatches. See Regex related variables in perlvar. As there is only one match I use the first element, $-[0].

I use the simple (.*?) instead of an unneeded negative lookahead in the middle.

With your whole string restored (plus the extra \amp), the above prints

Found '\amp' -- Line number:   7, match start: 39, match end: 43
Found '\amp' -- Line number:  14, match start: 13, match end: 17

where I've added another \amp on the line-before-last, for a better test.

Clarification: We need the line number in the whole file and the column in the line where \amp is found, within the Latex's align environment (given by \begin{align}, \end{align}).

use warnings;
use strict;

my $file = 'doc.tex';
open my $fh, '<', $file or die "Can't open $file: $!";

while (<$fh>)
{
    if (/\begin\{align\}/ .. /\end\{align\}/)
    {   
        if (/(\amp)/i) 
        {   
            print  "Found '$1' -- ";
            printf "Line number: %3d, match start: %2d, match end: %2d
",
                $., $-[0], $+[0];
        }
    }
}

where the if statement uses the range operator to ensure that the /\amp/ match is performed within the align environment only. The $. variable gives us the line number, and the use of @- and @+ is the same as explained above.

With the file doc.tex having the contents shown in the question, this prints

Found '\amp' -- Line number:  15, match start: 39, match end: 43
Found '\amp' -- Line number:  28, match start: 41, match end: 45
Found '\amp' -- Line number:  33, match start: 38, match end: 42
Found '\amp' -- Line number:  38, match start: 41, match end: 45

which I can confirm as correct locations in that text.

Find line and column number in the text file using perl

Answers (2)

Related Questions