Reputation: 141320
How can you find the repetiting sequences of at least 30 numbers?
Sample of the data
2.3758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840546697038724373576309794988610478359908883826879271070615034168564920273348519362186788154897494305239179954441913439635535307517084282460136674259681093394077448747152619589977220956719817767653758542141230068337129840547
My attempt in Vim
:g/\(\d\{4}\)\[^\1\]\1/
|
|----------- Problem here!
I do not know how you can have the negation of the first glob.
Upvotes: 3
Views: 8967
Reputation: 91038
First of all, to find your repeating numbers, you can use this simple search:
/\(\d\{5\}\).\{-}\1
This search finds repetitions of 5 digits. Unfortunately, vim highlights from the start of the 5 digit number to the end of the repetition - including every digit in between - and this makes it hard to see what the 5 digit number is. Also, because your number sequence repeats so much, the whole thing is highlighted because there are repeats all the way through.
You will probably find it's more useful to use :set incsearch
and type /\(\d\{5\}\).\{-}\1
or /\(\d\{5\}\)\ze.\{-}\1
without hitting enter so you can see what the digits are.
This command might be more useful to you:
:syn region repeatSection matchgroup=Search start=/\z(\d\{30}\)/ matchgroup=Error end=/\z1/ oneline
This will highlight a sequence of 30 digits in yellow (first time it is seen) or red (when it is repeated). Note that this only works for a single line of text (multi-line isn't possible).
Upvotes: 5
Reputation: 89093
This command will match lines with 123451234
but not 111111111
:g/\(\d\{4}\)\1\@!.\1/
\1\@!.
uses a negative lookahead to say "make sure this position doesn't match (\@!
) group 1 (\1
), then consume a character (.
)"Upvotes: 0
Reputation: 50179
If it helps you on the way, the appropriate way to make sure that the following set of characters aren't the same as what is stored in back-reference #1 would be (?!\1)
. Note that the (?!)
(negative look-ahead) group is a zero-width assertion (i.e., it will not change the position of the cursor, it just checks whether the regex should fail or not.)
Whether that is supported by the regex engine you're using, I don't know.
I just had a quick sketch on paper, and something along these lines might work in PCRE... but I haven't tested it and can't right now, but maybe it'll give you some ideas:
(?=(\d{30}))\d(?=\d{29,}?\1)
To ensure that I understood you correctly, the purpose of the above regex would be to match any sequence of 30 digits that also exists later in the whole string being searched.
My thoughts for the above regex were these:
Upvotes: 0
Reputation: 497232
I'm not sure why you need the negation. /\(\d\{4\}\)\1/
will match a sequence of (exactly) four digits, repeated once. You probably actually want something like /\(\d\{30,\}\)\1/
to get your "at least 30". This appears to work for me, unless I've misunderstood what you're trying to search for. Note that since the regex are greedy, you will get the longest possible repeated sequence.
Upvotes: 2