Nico
Nico

Reputation: 13

How to identify a periodic sequence in an array of integers

The main goal is to find a periodic sequence in an array with bash,for example :

{ 2, 5, 7, 8, 2, 6, 5, 3, 5, 4, 2, 5, 7, 8, 2, 6, 5, 3, 5, 4, 2, 5, 7, 8, 2, 6, 5, 3, 5, 4 }
or { 2, 5, 6, 3, 4, 2, 5, 6, 3, 4, 2, 5, 6, 3, 4, 2, 5, 6, 3, 4 }

which must return as identified sequence for the two example
{ 2, 5, 7, 8, 2, 6, 5, 3, 5, 4 } and { 2, 5, 6, 3, 4 }

I tried with a list and a sub-list made of two arrays but with no success. I must be missing something in my loops . I think to the "tortoise and hare" algorithm as an alternative but i miss some knowledge in bash commands to implement it .

I prefer to post my second try with tortoise and hare as the first seem to be a useless try :

#!/bin/bash
declare -A array=( 1, 2, 3, 1, 2, 3, 1, 2, 3 )
declare -A found=()
loop="notfound"
tortoise=`echo ${array[0]}`
hare=`echo ${array[0]}`
found[0]=`echo ${array[0]}`
while ( $loop == "notfound" )
do
    for ((i=1;i=`echo ${#array[@]}`;i++))
    do
        if (( `echo ${array[$#]}` == $hare ))
        then
            echo "no loop found"
            exit 0
        fi
        hare=`echo ${array[$i]}`
        if (( `echo ${array[$#]}` == $hare ))
        then
            echo "no loop found"
            exit 0
        fi
        hare=`echo ${array[$(($i+1))]}`
        tortoise=`echo  ${array[$i]}`
        found[$i]=`echo  ${array[$i]}`
        if (( $hare == $tortoise ))
        then
            loop="found"
            printf "$found[@]}"
        fi
    done
done

I got errors on associative array needing indice

Upvotes: 1

Views: 422

Answers (2)

steeldriver
steeldriver

Reputation: 344

Given an array a of single decimal digits

a=(2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4)

then using regular expression backsubstitution, for example in perl

printf '%d' "${a[@]}" | perl -lne 'print $1 if /^(\d+)\1+/'
2578265354

Testing with an incomplete sequence

a=(1 2 3 1 2 3 1 2)
printf '%d' "${a[@]}" | perl -lne 'print $1 if /^(\d+)\1+/'
123

If you only want complete repeats, add a $ line anchor to the RE, /^(\d+)\1+$/


Now, if you want to identify the longest subsequence that is "most nearly" repeated, that's a little trickier. For example, in the case of your 250-digit sequence, there is a 118-digit subsequence, repeated 2 times (with 16 characters left over), whereas your expected output is a 13-digit subsequence (repeated 19 times, with 3 digits left over). So you want an algorithm that is "greedy but not too greedy".

One (hopefully not too inefficient) way to do that would be to successively remove trailing digits until an anchored match is obtained i.e. the entire remaining sequence s* may be represented as n x t for some subsequence t. In perl, we can write that as a simple loop

perl -lne 'while (! s/^(\d+)\1+$/$1/) {chop $_}; print'

Testing with your 250-digit sequence:

a=( 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 )

Then

printf '%d' "${a[@]}" | perl -lne 'while (! s/^(\d+)\1+$/$1/) {chop $_}; print'
1102120020222

NOTE: this will fail to terminate if the string is exhausted before a match is found; if that's a possibility, you will need to test for that and break out of the while loop.

Upvotes: 1

jai_s
jai_s

Reputation: 101

I tested this only with the inputs you provided. assumptions - pattern to match always starts at the beginning of the array and repeats there after.

#!/bin/bash

#arr=(2 5 7 8 2 6 5 3 5 4  2  5  7  8  2  6  5  3  5  4  2  5  7  8  2  6  5  3  5  4)
arr=(2  5  6  3  4  2  5  6  3  4  2  5  6  3  4  2  5  6  3  4)

echo ${arr[@]}
n=${#arr[*]}
match=0
in_pattern=false

print_array()
{
  local first=$1 
  local last=$2
  local i

  for ((i=first; i<=last; i++));do
    printf "%d " ${arr[i]}
  done
  printf "\n"
}

i=0
start=0
end=0
j=$((i+1))

while (( j < n )); do
  #echo "arr[$i] ${arr[i]}  arr[$j] ${arr[j]}"
  if [[ ${arr[i]} -ne ${arr[j]} ]];then
    if [[ $match -ge 1 ]];then 
      echo "arr[$i] != arr[$j]"
      echo "pattern doesnt repeat after match # $match"
      exit 1
    fi
    ((j++))
    i=0
    in_pattern=false
    continue
  fi
  if $in_pattern ; then 
    if [[ $i -eq $end ]];then
      ((match++))
      end_match=$j
      echo "match # $match matched from $start -> $end and $start_match -> $end_match"  
      print_array $start $end
      print_array $start_match $end_match
      ((j++))
      i=0
      in_pattern=false
      continue
    fi
  else
    if [[ $match -eq 0 ]];then
      end=$((j-1))
    fi
    start_match=$j 
    in_pattern=true 
    #echo "trying to match from start $start end $end to start_match $start_match" 
  fi
  ((i++))
  ((j++))
done


output with first array -

./sequence.sh 
2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4
match # 1 matched from 0 -> 9 and 10 -> 19
2 5 7 8 2 6 5 3 5 4 
2 5 7 8 2 6 5 3 5 4 
match # 2 matched from 0 -> 9 and 20 -> 29
2 5 7 8 2 6 5 3 5 4 

2nd array -

/sequence.sh 
2 5 6 3 4 2 5 6 3 4 2 5 6 3 4 2 5 6 3 4
match # 1 matched from 0 -> 4 and 5 -> 9
2 5 6 3 4 
2 5 6 3 4 
match # 2 matched from 0 -> 4 and 10 -> 14
2 5 6 3 4 
2 5 6 3 4 
match # 3 matched from 0 -> 4 and 15 -> 19
2 5 6 3 4 
2 5 6 3 4 

Upvotes: 0

Related Questions