user1987607
user1987607

Reputation: 2157

Check if part of a string is present in keys of hash table

I'm dealing with a hash table in perl.

I have a multiple strings, with multiple lenghts and multiple -:

pre1-pre2-text1-text2
pre3-text3
pre4-pre5-pre6-text4

I have a %hash with the following keys:

pre1-pre2
pre3
pre4-pre5-pre6

So the keys %hash only contain the pre part of the strings.

How can I check if there is a match between let's say the first string pre1-pre2-text1-text2 and the keys of %hash?

Upvotes: 1

Views: 298

Answers (3)

zdim
zdim

Reputation: 66891

One way: form a pattern using alternation of keys, and test strings against it

use warnings;
use strict;
use feature 'say';

my @strings = qw(pre-not pre1-pre2-text1-text2 pre3-text3 pre4-pre5-pre6-text4);

my %h = ( 'pre1-pre2' => 1, 'pre3' => 1, 'pre4-pre5-pre6' => 1 );

my $keys_re = join '|', map { quotemeta } keys %h; 

foreach my $str (@strings) { 
    say $str  if $str =~ /$keys_re/;
}

This has quadratic complexity, but alternation won't go through all keys and it's C (regex itself).

A possible improvement (or a necessity!) may be to suitably sort keys. For example, shortest first

my $keys_re = join '|', map { quotemeta } sort { length $a <=> length $b } keys %h; 

This may help if there are keys with common parts, but note that it may be a non-trivial adjustment which can affect correctness -- and which may be needed; consider carefully.

To also get the key itself add the capturing parenthesis around the pattern

foreach my $str (@strings) { 
    say "$str matched by key: $1"  if $str =~ /($keys_re)/;
}

where $1 contains the alternation that matched and was captured, which is the key.

Upvotes: 2

Dada
Dada

Reputation: 6626

This answer supposes that pre cannot occure in the middle of the string (ie, you won't have a string like pre1-pre2-text1-pre5 where your prefix would only be pre1-pre2). If this assumption isn't valid, then use /^((?:pre\d+)(?:-pre\d+)*)/ instead of /^(.*pre\d+)/ (I prefer the latter because it's more readable, but the former is more precise).

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

my %pre = map { $_ => 1 } qw(pre1-pre2 pre3 pre4-pre5-pre6);

while (<DATA>) {
    my ($prefix) = /^(.*pre\d+)/;
    if ($prefix && exists $pre{$prefix}) {
        say "Prefix exists: $prefix";
    } else {
        say "Prefix doesn't exist: $prefix";
    }
}

__DATA__
pre1-pre2-text1-text2
pre3-text3
pre4-pre5-pre6-text4
pre7-pre8-text5

If you could have a line pre1-pre2-text1 where the prefix should be just pre1, then this solution won't work. In that case, you'll have no other choice than to iterate over all the keys of the hash and check if they match the beginning of the string:

while (<DATA>) {
    for my $prefix (keys %pre) {
        if (/^\Q$prefix/) {
            say "Found prefix: $prefix";
            last;
        }
    }
}

However, this is far less efficient, since you need to iterate over all of the hash keys for each line.
Regarding \Q: it ensures that this solution works even if your prefixes contain special regex characters (like + or .). If you prefixes are always just like pre1-pre2, then you can omit \Q.


If you have trouble understanding my %pre = map { $_ => 1 } qw(pre1-pre2 pre3 pre4-pre5-pre6);: it's a concise version of

my %prev = (
    'pre1-pre2'      => 1,
    'pre3'           => 1,
    'pre4-pre5-pre6' => 1
);

Upvotes: 1

Raghuram
Raghuram

Reputation: 3967

I added the inputs given by you in small perl code and i am able to check whether there is a match in keys

#!/usr/bin/perl
use warnings;

my %langs = ( "pre1-pre2" => 'pre1-pre2',
 "pre3" => 'pre3',
 "pre4-pre5-pre6" => 'pre4-pre5-pre6');

@pats=("pre1-pre2-text1-text2", "pre3-text3", "pre4-pre5-pre6-text4");

for(keys %langs){
  foreach $ss (@pats){
    if (index($ss,$_) != -1){
      print("Key contains:",$_, "|", $ss,"\n");
    }
    else{
      print("NOT FOUND:",$_, "|", $ss,"\n");
    }
  }
}

NOTE: If i have understood your requirement rightly then this will help you.

Upvotes: 1

Related Questions