Boggarapu Lokesh
Boggarapu Lokesh

Reputation: 11

How to print a string inside double quotes inside open brackets?

/* start of maker a_b.c[0] */

/* start of maker a_b.c[1] */

maker ( "a_b.c[0]" )

maker ( "a_b.c[1]" )

How to extract the strings inside double quotes and store them into an array? Here's what i have tried.

open(file, "P2.txt");
@A = (<file>) ;
foreach $str(@A)
{
     if($str =~ /"a_b.c"/)
       {
           print "$str \n"; 
       } 
} 

Note: Only content inside double quotes have to be stored into an array. If you see the 1st line of example inside slashes, you'll see same string that i want to match. That shouldn't get printed. So only the string inside double quotes should be stored into an array. Even if the same string gets repeated somewhere else without double quotes, it should not get printed. .

Upvotes: 0

Views: 250

Answers (2)

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118148

Looking at the sample input you provided, the task can be paraphrased as "extract single string arguments to things that look like function invocations". It seems like there is the added complication not matching in C-style comments. For that, note perlfaq -q comment.

As the FAQ entry demonstrates, ignoring content in arbitrary C-style comments is generally not trivial. I decided to try C::Tokenize to help:

#!/usr/bin/env perl

use strict;
use warnings;

use feature 'say';

use C::Tokenize qw( tokenize );
use Const::Fast qw( const );
use Path::Tiny qw( path );

sub is_open_paren {
    ($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq '(');
}

sub is_close_paren {
    ($_[0]->{type} eq 'grammar') && ($_[0]->{grammar} eq ')');
}

sub is_comment {
    $_[0]->{type} eq 'comment';
}

sub is_string {
    $_[0]->{type} eq 'string';
}

sub is_word {
    $_[0]->{type} eq 'word';
}

sub find_single_string_args_in_invocations {
    my ($source) = @_;

    my $tokens = tokenize(path( $source )->slurp);

    for (my $i = 0; $i < @$tokens; ++$i) {
        next if is_comment( $tokens->[$i] );

        next unless is_word( $tokens->[$i] );
        next unless is_open_paren( $tokens->[$i + 1] );
        next unless is_string( $tokens->[$i + 2] );
        next unless is_close_paren( $tokens->[$i + 3]);

        say $tokens->[$i + 2]->{string};
        $i += 3;
    }
}

find_single_string_args_in_invocations($ARGV[0]);

which, with your input, yields:

C:\Temp> perl t.pl test.c
"a_b.c[0]"
"a_b.c[1]"

Upvotes: 1

Dave Cross
Dave Cross

Reputation: 69294

It's not about looking for strings in double quotes. It's about defining a pattern (a regular expression) that matches the lines that you want to find.

Here's the smallest change that I can make to your code in order to make this work:

open(file, "P2.txt");
@A = (<file>) ;
foreach $str(@A)
{
     if($str =~ /"a_b.c/)  # <=== Change here
       {
           print "$str \n"; 
       } 
} 

All I've done is to remove the closing double-quote from your match expression. Because you don't care what comes after that, you don't need to specify it in the regular expression.

I should point out that this isn't completely correct. In a regular expression, a dot has a special meaning (it means "match any character here") so to match an actual dot (which is what you want), you need to escape the dot with a backslash. So it should be:

if($str =~ /"a_b\.c/)

Rewriting to use a few more modern Perl practices, I would do something like this:

# Two safety nets to find problems in your code
use strict;
use warnings;

# say() is a better print()
use feature 'say';

# Use a variable for the filehandle (and declare it with 'my')
# Use three-arg version of open()
# Check return value from open() and die if it fails
open(my $file, '<', "P2.txt") or die $!;

# Read data directly from filehandle
while ($str = <$file>)
{
     if ($str =~ /"a_b\.c/)
       {
           say $str; 
       } 
}

You could even use the implicit variable ($_) and statement modifiers to make your loop even simpler.

while (<$file>) {
  say if /"a_b\.c/;
}

Upvotes: 1

Related Questions