MCSharp
MCSharp

Reputation: 1068

Finding duplicate Substrings?

Is there a logical way to find duplicate substrings in a string that would work regardless of how many times the substring is duplicated; then return the amount of times it was duplicated with the final word.

For example, abc-abc-abc-
Can be broken into | abc- | abc- | abc-
                                 = abc- x3

Example 2, abc-abc-abc-abc-
Can be broken into | abc- | abc- | abc- | abc-
                                        = abc- x4

For even repetitions, it is not a problem to split the string in half and then compare both substrings. Then you can keep looping until the halves do not match.

For odd string length, you can break it into 3rds and compare all three parts and do the same thing.

The problem happens when a word repeats 7 or 11 times. Dividing the length by 4 or 5 could work.

So for example, a string with yesnoyesnoyesnoyesnoyesnoyesnoyesno has the substring yesno repeating 7 times.

Is there some sort of a formula, regex, or linq that can transform yesnoyesnoyesnoyesnoyesnoyesnoyesno to yesno (x7)?

Upvotes: 0

Views: 1619

Answers (2)

alpha bravo
alpha bravo

Reputation: 7948

this worked for me (..*?)(?=\1(?=\1*$)|$) demo
or even shorter (..*?)(?=\1*$) demo

  • (..*?) capture at least one character and add as many as needed
  • (?=\1*$) lookahead for the previously captured results as many times as possible to the end.

Upvotes: 4

Tim.Tang
Tim.Tang

Reputation: 3188

var list=new string[]{"abc-abc-abc-",
                    "abc-abc-abc-abc-",
                    "yesnoyesnoyesnoyesnoyesnoyesnoyesno"};
var reg=new Regex("(.+?)(?=\\1|$)");
foreach(var str in list)
{
  string result=string.Format("{0} (x{1})",reg.Match(str).Value,  reg.Matches(str).Count);
  Console.WriteLine(result);
}

OUTPUT:

abc- (x3)
abc- (x4)
yesno (x7)

Upvotes: 1

Related Questions