Mariya
Mariya

Reputation: 847

How do I make inexact string comparisons with Perl?

Given two strings, I want to find all common substrings of a specified length, but allowing one character to be different.

For example, if s1 is 'ATCAGC', s2 is 'ATAATCGAC', and the specified length is 3, then I'd want output along these lines:

ATC from s1 matches ATA, ATC from s2
TCA from s1 matches TAA, TCG from s2

Questions

Upvotes: 3

Views: 563

Answers (1)

Jeff Burdges
Jeff Burdges

Reputation: 4261

First, google result for "perl hamming distance" found a perlmonks thread that mentions Text::LevenshteinXS, various typical implementations, and a cute xor trick :

sub hd{ length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ tr[\0][\0] ) }

You should skim wikipedia article on String metrics if Levenshtein distance or Hamming distance aren't familiar.

Upvotes: 3

Related Questions