user3766874
user3766874

Reputation: 813

Multiple occurence of string in a line to be grepped

I have two ques.

1: if i have a line like this "Fruits: Mango Banana", I want to capture the "Mango Banana" part and assign it to another variable. Currently I am following this,

if(/$line == Fruits:\s(\w+)/){
myFav=$1;
}

but its returning only "Mango" not "Mango Banana". Can anyone suggest how to get full Fruits list delimited by space.

2:If I have some string repeated in the same line, I want to capture all the occurances.

Eg: if I have a line like "I have Fruit: Mango and the color of the Fruit: Banana is green". I want to capture both Mango and Banana values as well.

if(/$line == Fruit:\s(\w+)/){
myFav=$1;
}

Generally the above code stops searching after the first occurance of the "Fruit:". Can anyone help with the above two?

Thanks in advance :)

Upvotes: 0

Views: 63

Answers (5)

David W.
David W.

Reputation: 107040

In your first post, you're using:

if ( /$line == Fruits:\s(\w+)/ ) {

First, you should use ~= instead of == for regular expressions. Second, you put the slashes around the regular expressions like this:

if ( $line ~= /Fruits:\s(\w+)/ ) {

Now, the \w+ is for a word which includes letters, numbers, underscores and that's it. It doesn't match spaces.

You have:

Fruits: Mango Banana

So, \w+ will match Mango, but will stop matching the space after Mango.

If you want to match both:

if ( $line =~ /^Fruits:\s+(.+)/ ) {

Note the . will match any character including a space. The plug sign means match at least one space. An asterisk matches zero or more. Note I also use \s+ instead of just \s. This way, if there's more than one space after Fruits, you'll have a match.

In your second example, you can do this:

my @fruits = $line =~ /Fruits:\s+(\S+)/g

The g on the end allows for multiple matches. Otherwise, only the first will be used. The \S represents all non-white space which will included possible dashes. This will put your matches into the array @fruits. Read the Regular Expression Tutorial. It'll help you understand what's going on a bit better.

Always use use strict; and use warnings; in your program. It'll help you catch errors. You will have to declare variables with my, but it's worth it.

Upvotes: 0

Miller
Miller

Reputation: 35198

One solution would be to rely on the fact that your fruit names are capitalized.

However, I'd be tempted to lean toward having two regular expressions, one for Fruits and one for Fruit.

use strict;
use warnings;

while (<DATA>) {
    chomp;
    while (/Fruits?: ((?:[A-Z]\w*\s*)+)(?<!\s)/g) {
        print "Line $. - '$1'\n";
    }
}

__DATA__
Fruits: Mango Banana
I have Fruit: Mango and the color of the Fruit: Banana is green

Outputs:

Line 1 - 'Mango Banana'
Line 2 - 'Mango'
Line 2 - 'Banana'

Upvotes: 1

Dave Cross
Dave Cross

Reputation: 69264

1/ The reason why you regex only returns "Mango" is that the \w+ matches "word" characters. That is numbers, letters and the underscore (i.e. the characters that are valid in Perl symbol names). If you want to match the space between two fruit names then you'll need to add a space (or, perhaps better, \s which matches all whitespace) to your regex. You probably want to put both of those atoms in a character class.

/Fruit:\s([\w\s]+)/

2/ By default the match operator only matches the first occurrence of the regex in the input string. In order to match all of them, you need to add the /g option to the match operator.

/Fruit:\s([\w\s]+)/g

Some other notes that you might find useful:

  • The Perl regex tutorial is a good way to learn this stuff
  • The Perl regex documentation has all the gory details.
  • The Perl operator documentation explains the match operator.
  • Adding use strict and use warnings to all of your code is a good habit to get into.
  • You match a string again a regex using the binding operator (=~) not the assignment operator (=). And the input string and the binding operator go outside the match operator.

    if($line =~ /Fruits:\s(\w+)/){

Upvotes: 0

Bulrush
Bulrush

Reputation: 558

$line="I have Fruit: Mango and the color of the Fruit: Banana is green";
@found=($line=~m/Fruit: \w+/g); # Make sure to use g operator, finds all matches in $line
for each $s (@found)
    {print "$s\n";
    }

Upvotes: 0

Toto
Toto

Reputation: 91415

Use this instead: ([\w\s]+)

if($line =~ /Fruits:\s([\w\s]+)/) {
    $myFav = $1;
}

And always:

use strict;
use warnings;

Upvotes: 0

Related Questions