Reputation: 13
The main goal is to find a periodic sequence in an array with bash,for example :
{ 2, 5, 7, 8, 2, 6, 5, 3, 5, 4, 2, 5, 7, 8, 2, 6, 5, 3, 5, 4, 2, 5, 7, 8, 2, 6, 5, 3, 5, 4 }
or { 2, 5, 6, 3, 4, 2, 5, 6, 3, 4, 2, 5, 6, 3, 4, 2, 5, 6, 3, 4 }
which must return as identified sequence for the two example
{ 2, 5, 7, 8, 2, 6, 5, 3, 5, 4 } and { 2, 5, 6, 3, 4 }
I tried with a list and a sub-list made of two arrays but with no success. I must be missing something in my loops . I think to the "tortoise and hare" algorithm as an alternative but i miss some knowledge in bash commands to implement it .
I prefer to post my second try with tortoise and hare as the first seem to be a useless try :
#!/bin/bash
declare -A array=( 1, 2, 3, 1, 2, 3, 1, 2, 3 )
declare -A found=()
loop="notfound"
tortoise=`echo ${array[0]}`
hare=`echo ${array[0]}`
found[0]=`echo ${array[0]}`
while ( $loop == "notfound" )
do
for ((i=1;i=`echo ${#array[@]}`;i++))
do
if (( `echo ${array[$#]}` == $hare ))
then
echo "no loop found"
exit 0
fi
hare=`echo ${array[$i]}`
if (( `echo ${array[$#]}` == $hare ))
then
echo "no loop found"
exit 0
fi
hare=`echo ${array[$(($i+1))]}`
tortoise=`echo ${array[$i]}`
found[$i]=`echo ${array[$i]}`
if (( $hare == $tortoise ))
then
loop="found"
printf "$found[@]}"
fi
done
done
I got errors on associative array needing indice
Upvotes: 1
Views: 422
Reputation: 344
Given an array a
of single decimal digits
a=(2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4)
then using regular expression backsubstitution, for example in perl
printf '%d' "${a[@]}" | perl -lne 'print $1 if /^(\d+)\1+/'
2578265354
Testing with an incomplete sequence
a=(1 2 3 1 2 3 1 2)
printf '%d' "${a[@]}" | perl -lne 'print $1 if /^(\d+)\1+/'
123
If you only want complete repeats, add a $
line anchor to the RE, /^(\d+)\1+$/
Now, if you want to identify the longest subsequence that is "most nearly" repeated, that's a little trickier. For example, in the case of your 250-digit sequence, there is a 118-digit subsequence, repeated 2 times (with 16 characters left over), whereas your expected output is a 13-digit subsequence (repeated 19 times, with 3 digits left over). So you want an algorithm that is "greedy but not too greedy".
One (hopefully not too inefficient) way to do that would be to successively remove trailing digits until an anchored match is obtained i.e. the entire remaining sequence s*
may be represented as n x t
for some subsequence t
. In perl, we can write that as a simple loop
perl -lne 'while (! s/^(\d+)\1+$/$1/) {chop $_}; print'
Testing with your 250-digit sequence:
a=( 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 2 1 2 0 0 2 0 2 2 2 1 1 0 )
Then
printf '%d' "${a[@]}" | perl -lne 'while (! s/^(\d+)\1+$/$1/) {chop $_}; print'
1102120020222
NOTE: this will fail to terminate if the string is exhausted before a match is found; if that's a possibility, you will need to test for that and break out of the while
loop.
Upvotes: 1
Reputation: 101
I tested this only with the inputs you provided. assumptions - pattern to match always starts at the beginning of the array and repeats there after.
#!/bin/bash
#arr=(2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4)
arr=(2 5 6 3 4 2 5 6 3 4 2 5 6 3 4 2 5 6 3 4)
echo ${arr[@]}
n=${#arr[*]}
match=0
in_pattern=false
print_array()
{
local first=$1
local last=$2
local i
for ((i=first; i<=last; i++));do
printf "%d " ${arr[i]}
done
printf "\n"
}
i=0
start=0
end=0
j=$((i+1))
while (( j < n )); do
#echo "arr[$i] ${arr[i]} arr[$j] ${arr[j]}"
if [[ ${arr[i]} -ne ${arr[j]} ]];then
if [[ $match -ge 1 ]];then
echo "arr[$i] != arr[$j]"
echo "pattern doesnt repeat after match # $match"
exit 1
fi
((j++))
i=0
in_pattern=false
continue
fi
if $in_pattern ; then
if [[ $i -eq $end ]];then
((match++))
end_match=$j
echo "match # $match matched from $start -> $end and $start_match -> $end_match"
print_array $start $end
print_array $start_match $end_match
((j++))
i=0
in_pattern=false
continue
fi
else
if [[ $match -eq 0 ]];then
end=$((j-1))
fi
start_match=$j
in_pattern=true
#echo "trying to match from start $start end $end to start_match $start_match"
fi
((i++))
((j++))
done
output with first array -
./sequence.sh
2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4 2 5 7 8 2 6 5 3 5 4
match # 1 matched from 0 -> 9 and 10 -> 19
2 5 7 8 2 6 5 3 5 4
2 5 7 8 2 6 5 3 5 4
match # 2 matched from 0 -> 9 and 20 -> 29
2 5 7 8 2 6 5 3 5 4
2nd array -
/sequence.sh
2 5 6 3 4 2 5 6 3 4 2 5 6 3 4 2 5 6 3 4
match # 1 matched from 0 -> 4 and 5 -> 9
2 5 6 3 4
2 5 6 3 4
match # 2 matched from 0 -> 4 and 10 -> 14
2 5 6 3 4
2 5 6 3 4
match # 3 matched from 0 -> 4 and 15 -> 19
2 5 6 3 4
2 5 6 3 4
Upvotes: 0