Kero
Kero

Reputation: 1944

how to extract sentence with match string using regex in perl?

i try to return sentence that match string. long story short, i am using WWW::Wikipedia module that search for keyword in wikipedia website and return page that contains these keywords and then searching through return page and try to find match sentence. example: if return text like this below

'Machine learning' is the subfield of computer science that, according to
Arthur Samuel in 1959, gives "computers the ability to learn without being
explicitly
programmed."[https://www.cims.nyu.edu/~munoz/files/ml_optimization.pdf Machine
Learning and Optimization] Evolved from the study of pattern recognition and
computational learning theory in artificial intelligence,http://www.britannica.com/EBchecked/topic/1116194/machine-
learning machine learning explores the study and construction of algorithms
that can learn from and make predictions on data – such algorithms overcome
following strictly static program instructions by making data-driven
predictions or decisions, through building a model from sample inputs. Machine
learning is employed in a range of computing tasks where designing and
programming explicit algorithms with good performance is difficult or
unfeasible example applications include email filtering, detection of network
intruders or malicious insiders working towards a data breach, optical
character recognition OCR,Wernick, Yang, Brankov, Yourganov and Strother,
Machine Learning in Medical Imaging, [[IEEE Signal Processing Society|IEEE
Signal Processing Magazine]], vol. 27, no. 4, July 2010, pp. 25–38 

and i want to find sentence that contains machine learning is so i create this simple subroutine in perl using regex.

sub findSubject {
   my ($question) = @_ ;
   my $search;
   my @words;
   my $tobe;
   my $tail="";
   my @verbs =('what','who','when','where');
   $question =~ s/\s*[.?!]$//gi;
   $question =~ s/(^\s+|\s+)$//g;
    if ($question =~ m/(who|what)/i){
         $question =~ s/(who|what)//i;
         @words=split /\s+/,$question;
         shift @words ;
         $tobe = shift @words ;
         $search = join " ",@words;
         return ($search,$tobe,$tail);
     } elsif ($question =~ m/(when|where)/i)
    {
        $question =~ s/(where|when)//i;
        @words=split /\s+/,$question;
        shift @words ;
        $tail = pop @words;
        $tobe = shift @words;
        $search = join " ",@words;
        return ($search,$tobe,$tail);
    }
    else {
         return undef;
    }

}

sub findAnswer{
    my ($result,$search,$tobe,$tail)=@_;

    my $outFormat;
    if ($tail eq ""){
        $result =~s/[\)\(\}\{\;]//gi;
        print "$result\n";
        my $out = $result=~ m/\b\'?$search\'?\b\s*\b$tobe\b\s*(.*)\./gi;
        print "$out\n";
        $outFormat = "$search $tobe $out";
        return $outFormat ;
    }else {
        return "golden";
    }
}
sub main{
   my $f2;
   my @inputs= @ARGV;
   my $searchInput = 'here';
   my $searchOutput;
   my $tobe;
   my $tail;
   openingStatement();
   open($f2, ">:utf8", $inputs[0]);
    while(my $q = <STDIN>) {
        my $wiki = WWW::Wikipedia->new( clean_html => 1 );
        chomp $q;
        $q = lc $q;

        if ($q eq 'exit'){
            exit;
        }
        else {
            ($searchInput,$tobe,$tail) = findSubject($q);
            if ($searchInput){
                my $result = $wiki->search( $searchInput );
                if ( $result ) {
                   my $res = $result->fulltext();
                   if ($res){
                       $searchOutput = findAnswer($res,$searchInput,$tobe,$tail);
                       binmode(STDOUT, ":utf8");
                       print "$searchOutput\n" ;
                       print $f2  "$searchOutput\n";
                   }

                }else {
                    print "i dont know the answer \n";
                }
            }else {
               print "program accept only questions type {who, what, when, where}\n";
            }

        }
    }

   close $f2;
}

where $result is prograph i am parsing, and $search, $tobe are words i am looking for ,in this case is machine learning is.

but i am getting this weird error:

Use of uninitialized value $_ in pattern match (m//) at qa-system.pl line 52,
    <STDIN> line 1 (#1)
    (W uninitialized) An undefined value was used as if it were already
    defined.  It was interpreted as a "" or a 0, but maybe it was a mistake.
    To suppress this warning assign a defined value to your variables.

    To help you figure out what was undefined, perl will try to tell you
    the name of the variable (if any) that was undefined.  In some cases
    it cannot do this, so it also tells you what operation you used the
    undefined value in.  Note, however, that perl optimizes your program
    anid the operation displayed in the warning may not necessarily appear
    literally in your program.  For example, "that $foo" is usually
    optimized into "that " . $foo, and the warning will refer to the
    concatenation (.) operator, even though there is no . in
    your program.

and subroutine return this strange value

machine learning is 18446744073709551615

any clue why i am getting this error. is there any error in my regex?

Upvotes: 1

Views: 532

Answers (1)

Schwern
Schwern

Reputation: 164819

You have a typo.

$out = $result= ~ m/\s*\'?$search\'? $tobe (.*)\s*\./i;
              ^^^

The =~ operator cannot have a space in it. With a space it's assignment (=) and bitwise negation (~).

Instead, $result = ~ m/.../ means "set $result equal to the bitwise negation of the result of $_ =~ m/.../. In other words, nonsense. (Many Perl operators and built in functions use $_ if no variable is supplied.)

Upvotes: 1

Related Questions