Reputation: 2157
I'm dealing with a hash table in perl.
I have a multiple strings, with multiple lenghts and multiple -
:
pre1-pre2-text1-text2
pre3-text3
pre4-pre5-pre6-text4
I have a %hash
with the following keys:
pre1-pre2
pre3
pre4-pre5-pre6
So the keys %hash
only contain the pre
part of the strings.
How can I check if there is a match between let's say the first string pre1-pre2-text1-text2
and the keys of %hash
?
Upvotes: 1
Views: 298
Reputation: 66891
One way: form a pattern using alternation of keys, and test strings against it
use warnings;
use strict;
use feature 'say';
my @strings = qw(pre-not pre1-pre2-text1-text2 pre3-text3 pre4-pre5-pre6-text4);
my %h = ( 'pre1-pre2' => 1, 'pre3' => 1, 'pre4-pre5-pre6' => 1 );
my $keys_re = join '|', map { quotemeta } keys %h;
foreach my $str (@strings) {
say $str if $str =~ /$keys_re/;
}
This has quadratic complexity, but alternation won't go through all keys and it's C (regex itself).
A possible improvement (or a necessity!) may be to suitably sort keys. For example, shortest first
my $keys_re = join '|', map { quotemeta } sort { length $a <=> length $b } keys %h;
This may help if there are keys with common parts, but note that it may be a non-trivial adjustment which can affect correctness -- and which may be needed; consider carefully.
To also get the key itself add the capturing parenthesis around the pattern
foreach my $str (@strings) {
say "$str matched by key: $1" if $str =~ /($keys_re)/;
}
where $1
contains the alternation that matched and was captured, which is the key.
Upvotes: 2
Reputation: 6626
This answer supposes that pre
cannot occure in the middle of the string (ie, you won't have a string like pre1-pre2-text1-pre5
where your prefix would only be pre1-pre2
). If this assumption isn't valid, then use /^((?:pre\d+)(?:-pre\d+)*)/
instead of /^(.*pre\d+)/
(I prefer the latter because it's more readable, but the former is more precise).
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my %pre = map { $_ => 1 } qw(pre1-pre2 pre3 pre4-pre5-pre6);
while (<DATA>) {
my ($prefix) = /^(.*pre\d+)/;
if ($prefix && exists $pre{$prefix}) {
say "Prefix exists: $prefix";
} else {
say "Prefix doesn't exist: $prefix";
}
}
__DATA__
pre1-pre2-text1-text2
pre3-text3
pre4-pre5-pre6-text4
pre7-pre8-text5
If you could have a line pre1-pre2-text1
where the prefix should be just pre1
, then this solution won't work. In that case, you'll have no other choice than to iterate over all the keys of the hash and check if they match the beginning of the string:
while (<DATA>) {
for my $prefix (keys %pre) {
if (/^\Q$prefix/) {
say "Found prefix: $prefix";
last;
}
}
}
However, this is far less efficient, since you need to iterate over all of the hash keys for each line.
Regarding \Q
: it ensures that this solution works even if your prefixes contain special regex characters (like +
or .
). If you prefixes are always just like pre1-pre2
, then you can omit \Q
.
If you have trouble understanding my %pre = map { $_ => 1 } qw(pre1-pre2 pre3 pre4-pre5-pre6);
: it's a concise version of
my %prev = (
'pre1-pre2' => 1,
'pre3' => 1,
'pre4-pre5-pre6' => 1
);
Upvotes: 1
Reputation: 3967
I added the inputs given by you in small perl code and i am able to check whether there is a match in keys
#!/usr/bin/perl
use warnings;
my %langs = ( "pre1-pre2" => 'pre1-pre2',
"pre3" => 'pre3',
"pre4-pre5-pre6" => 'pre4-pre5-pre6');
@pats=("pre1-pre2-text1-text2", "pre3-text3", "pre4-pre5-pre6-text4");
for(keys %langs){
foreach $ss (@pats){
if (index($ss,$_) != -1){
print("Key contains:",$_, "|", $ss,"\n");
}
else{
print("NOT FOUND:",$_, "|", $ss,"\n");
}
}
}
NOTE: If i have understood your requirement rightly then this will help you.
Upvotes: 1