Nicky Feller
Nicky Feller

Reputation: 3889

Julia: does an Array contain a specific sub-array

In julia we can check if an array contains a value, like so:

> 6 in [4,6,5]
true

However this returns false, when attempting to check for a sub-array in a specific order:

> [4,6] in [4,6,5]
false

What is the correct syntax to verify if a specific sub-array exists in an array?

Upvotes: 14

Views: 11232

Answers (7)

Harry Ray
Harry Ray

Reputation: 21

There is no standard Julia library function to determine if a particular sequence occurs as a subsequence of another. Probably this is because this is actually a fairly trickly problem (known as the String-searching algorithm) and quite how to do it depends on whether you'll be searching repeatedly, whether you want to do multiple matches, have multiple patterns, want fuzzy matches, etc.

The other answers here give reasonable answers but some are old and Julia has improved, and I wanted to offer a slightly more idiomatic solution.

function issubarray(needle, haystack)
    getView(vec, i, len) = view(vec, i:i+len-1)
    ithview(i) = getView(haystack, i, length(needle))
    return any(i -> ithview(i) == needle, 1:length(haystack)-length(needle)+1)
end

This is lightening fast and requires almost no memory - Julia's view is lightweight and efficient. And, as always with Julia, the solution is generally to simply define more functions.

Upvotes: 2

b-fg
b-fg

Reputation: 4137

Here is a more up-to-date implementation using findall

function issubsequence(A, B)
    B1inA = findall(isequal(B[1]), A) # indices of the first element of b occuring in a
    matchFirstIndex = [] # Saves the first index in A of the occurances
    for i in B1inA
        if length(A[i:end]) < length(B) continue end
        if A[i:i + length(B) - 1] == B
            push!(matchFirstIndex, i)
        end
    end
    return matchFirstIndex
end

I get a similar runtime to @daycaster

@time issubsequence(rand(1:9, 1000), [2,3])
  0.000038 seconds (111 allocations: 20.844 KiB)
7-element Vector{Any}:
  57
 427
 616
 644
 771
 855
 982

Upvotes: 0

Adrien
Adrien

Reputation: 31

note that you can now vectorize in with a dot:

julia> in([4,6,5]).([4, 6])
2-element BitArray{1}:
 true
 true

and chain with all to get your answer:

julia> all(in([4,6,5]).([4, 6]))
true

Upvotes: 3

rodrigolece
rodrigolece

Reputation: 1109

I think it is worth mentioning that in Julia 1.0 you have the function issubset

> issubset([4,6], [4,6,5])
true

You can also quite conveniently call it using the \subseteq latex symbol

> [4,6] ⊆ [4,6,5]
true

This looks pretty optimized to me:

> using Random

> x, y = randperm(10^3)[1:10^2], randperm(10^3);

> @btime issubset(x, y);
16.153 μs (12 allocations: 45.96 KiB)

Upvotes: 12

daycaster
daycaster

Reputation: 2697

I used this recently to find subsequences in arrays of integers. It's not as good or as fast as @scott's subset2(x,y)... but it returns the indices.

function findsequence(arr::Array{Int64}, seq::Array{Int64})
    indices = Int64[]
    i = 1
    n = length(seq)
    if n == 1
        while true
            occurrence = findnext(arr, seq[1], i)
            if occurrence == 0
                break
            else
                push!(indices, occurrence)
                i = occurrence +1
            end
        end
    else
        while true
            occurrence = Base._searchindex(arr, seq, i)
            if occurrence == 0
                break
            else
                push!(indices, occurrence)
                i = occurrence +1
            end
        end
    end
    return indices
end

julia> @time findsequence(rand(1:9, 1000), [2,3])
    0.000036 seconds (29 allocations: 8.766 KB)
    16-element Array{Int64,1}:
   80
  118
  138
  158
  234
  243
  409
  470
  539
  589
  619
  629
  645
  666
  762
  856

Upvotes: 1

Scott Jones
Scott Jones

Reputation: 1750

It takes a little bit of code to make a function that performs well, but this is much faster than the issubvec version above:

function subset2(x,y)
    lenx = length(x)
    first = x[1]
    if lenx == 1
        return findnext(y, first, 1) != 0
    end
    leny = length(y)
    lim = length(y) - length(x) + 1
    cur = 1
    while (cur = findnext(y, first, cur)) != 0
        cur > lim && break
        beg = cur
        @inbounds for i = 2:lenx
            y[beg += 1] != x[i] && (beg = 0 ; break)
        end
        beg != 0 && return true
        cur += 1
    end
    false
end

Note: it would also be much more useful if the function actually returned the position of the beginning of the subarray if found, or 0 if not, similarly to the findfirst/findnext functions.

Timing information (the second one is using my subset2 function):

  0.005273 seconds (65.70 k allocations: 4.073 MB)
  0.000086 seconds (4 allocations: 160 bytes)

Upvotes: 7

Dan Getz
Dan Getz

Reputation: 18217

For the third condition i.e. vector [4,6] appears as a sub-vector of 4,6,5 the following function is suggested:

issubvec(v,big) = 
  any([v == slice(big,i:(i+length(v)-1)) for i=1:(length(big)-length(v)+1)])

For the second condition, that is, give a boolean for each element in els vectors which appears in set vector, the following is suggested:

function vecin(els,set)
  res = zeros(Bool,size(els))
  res[findin(els,set)]=true
  res
end

With the vector in the OP, these result in:

julia> vecin([4,6],[4,6,5])
2-element Array{Bool,1}:
 true
 true

julia> issubvec([4,6],[4,6,5])
true

Upvotes: 6

Related Questions