singingfish
singingfish

Reputation: 3167

How can I preserve whitespace when I match and replace several words in Perl?

Let's say I have some original text:

here is some text that has a substring that I'm interested in embedded in it.

I need the text to match a part of it, say: "has a substring".

However, the original text and the matching string may have whitespace differences. For example the match text might be:

has a
substring

or

has  a substring

and/or the original text might be:

here is some
text that has
a substring that I'm interested in embedded in it.

What I need my program to output is:

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

I also need to preserve the whitespace pattern in the original and just add the start and end markers to it.

Any ideas about a way of using Perl regexes to get this to happen? I tried, but ended up getting horribly confused.

Upvotes: 1

Views: 3191

Answers (5)

Brad Gilbert
Brad Gilbert

Reputation: 34120

This is an example of how you could do that.

#! /opt/perl/bin/perl
use strict;
use warnings;

my $submatch = "has a\nsubstring";

my $str = "
here is some
text that has
a substring that I'm interested in, embedded in it.
";

print substr_match($str, $submatch), "\n";

sub substr_match{
  my($string,$match) = @_;

  $match =~ s/\s+/\\s+/g;

  # This isn't safe the way it is now, you will need to sanitize $match
  $string =~ /\b$match\b/;
}

This currently does anything to check the $match variable for unsafe characters.

Upvotes: 0

David Andres
David Andres

Reputation: 31781

Been some time since I've used perl regular expressions, but what about:

$match = s/(has\s+a\s+substring)/[$1]/ig

This would capture zero or more whitespace and newline characters between the words. It will wrap the entire match with brackets while maintaining the original separation. It ain't automatic, but it does work.

You could play games with this, like taking the string "has a substring" and doing a transform on it to make it "has\s*a\s*substring" to make this a little less painful.

EDIT: Incorporated ysth's comments that the \s metacharacter matches newlines and hobbs corrections to my \s usage.

Upvotes: 5

Markus Jarderot
Markus Jarderot

Reputation: 89221

A many has suggested, use \s+ to match whitespace. Here is how you do it automaticly:

my $original = "here is some text that has a substring that I'm interested in embedded in it.";
my $search = "has a\nsubstring";

my $re = $search;
$re =~ s/\s+/\\s+/g;

$original =~ s/\b$re\b/[match starts here]$&[match ends here]/g;

print $original;

Output:

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

You might want to escape any meta-characters in the string. If someone is interested, I could add it.

Upvotes: 0

friedo
friedo

Reputation: 67028

In regexes, you can use + to mean "one or more." So something like this

/has\s+a\s+substring/

matches has followed by one or more whitespace chars, followed by a followed by one or more whitespace chars, followed by substring.

Putting it together with a substitution operator, you can say:

my $str = "here is some text that has     a  substring that I'm interested in embedded in it.";
$str =~ s/(has\s+a\s+substring)/\[match starts here]$1\[match ends here]/gs;

print $str;

And the output is:

here is some text that [match starts here]has     a  substring[match ends here] that I'm interested in embedded in it.

Upvotes: 2

Doug Hays
Doug Hays

Reputation: 1507

This pattern will match the string that you're looking to find:

(has\s+a\s+substring)

So, when the user enters a search string, replace any whitespace in the search string with \s+ and you have your pattern. The, just replace every match with [match starts here]$1[match ends here] where $1 is the matched text.

Upvotes: 3

Related Questions