mahesh cs
mahesh cs

Reputation: 335

find all common patterns (substrings) from two files efficiently

I am looking for algorithm which will extract all common patterns from files, the naive algorithm takes O(n^2). To find out all common pattern I need to generate all sub-strings and check it in another given file. I am looking for some data structure or algorithm so that there will be no require to generate all sub strings.Is there any efficient and elegant algorithm for the same.

For the sake of simplicity we will consider file as string. Lets say we have to string str1 = "xxabcyy" and str2="sydabcdy" so the expected out put is {"abc","y"}. The naive method is compare each substring of str1 with str2.For example I have all possible sub strings of str1 i.e. {"x","xx","xxa","xxab","xxabc","xxabcy","xxabcyy","xa","xab",..} then check each of this sub string is in str2 or not.

Upvotes: 0

Views: 266

Answers (1)

Gianluca Ghettini
Gianluca Ghettini

Reputation: 11688

Check Apriori and FPGrowth algorithms

https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining/The_FP-Growth_Algorithm

Used in market basket analysis and general common pattern analysis

Upvotes: 1

Related Questions