Reputation: 73

Retrieve string between two string delimiters using regex in perl

I have been working on this for a little while now and can't seem to figure it out. I have a file containing a bunch of lines all structured like the one below meaning each line starts with "!" and has three separators "<DIV>".

!the<DIV>car<DIV>drove down the<DIV>road off into the distance

I am interested in retrieving the last string "road off into the distance" I can't seem to get it to work. Below I have listed the current code I have.

while($line = <INFILE>) {
    $line =~ /<SEP>{3}(.*)/;
    print $1;
}

Any help would be greatly appreciated!

Upvotes: 3

Answers (4)

zdim

Reputation: 66891

I don't know whether you insist on regex or simply didn't think of else, but split will nicely do this

$text = (split '<DIV>', $str)[-1];

If you regularly have such repeating patterns split may well be better for the job than a pure regex. (Split also uses full regular expressions in its pattern, of course.)

ADDED

All this can be done directly, if you simply only need to pull the last thing from each line:

open my $fh, '<', $file;
my @text = map { (split '<DIV>')[-1] } <$fh>;
close $fh;
print "$_\n" for @text;

The split by default uses $_, which inside the map is the current element processed. For lines without a <DIV> this returns the whole line. A file handle in the list context serves all lines as a list; the list context is imposed by map here.

In case you want all text between delimiters you can do

my @rlines = map { [ split '<DIV>' ] } <$fh>;

where [ ] takes a reference to the list returned by split and thus @rlines contains references to arrays, each with text in between <DIV>s on a line. The leading ! is there though and to drop it a little more processing is needed.

Of course, for the map block you can use { (/.*<DIV>(.*)/)[0] } from Jim Garrison's answer for a single match, or modify the regex a little to catch'em all.

If performance is a factor then that's a little different question.

Upvotes: 3

Sandeep

Reputation: 51

Simple regex which answers your question:

my $match= '';
while($line = <INFILE>) {
 ($match) = $line =~/.*<DIV>(.*)/; 
}
print $match, "\n";

Upvotes: 0

fugu

Reputation: 6578

A simple substitution could work too:

while(<DATA>){
chomp;
my $text = (s/.*<DIV>//g, $_);
say $text;
}

Upvotes: 0

Jim Garrison

Reputation: 86774

The statement

@b = $a =~ /^!(.*?)<DIV>(.*?)<DIV>(.*?)<DIV>(.*)/

will split the string into a list, and you can then extract the 4th element with

$b[3]

If you really want only the last one, do this instead:

($text) = $a =~ /^!.*<DIV>(.*)/

Upvotes: 3

Retrieve string between two string delimiters using regex in perl

Answers (4)

Related Questions