Reputation: 3889
In julia we can check if an array contains a value, like so:
> 6 in [4,6,5]
true
However this returns false, when attempting to check for a sub-array in a specific order:
> [4,6] in [4,6,5]
false
What is the correct syntax to verify if a specific sub-array exists in an array?
Upvotes: 14
Views: 11232
Reputation: 21
There is no standard Julia library function to determine if a particular sequence occurs as a subsequence of another. Probably this is because this is actually a fairly trickly problem (known as the String-searching algorithm) and quite how to do it depends on whether you'll be searching repeatedly, whether you want to do multiple matches, have multiple patterns, want fuzzy matches, etc.
The other answers here give reasonable answers but some are old and Julia has improved, and I wanted to offer a slightly more idiomatic solution.
function issubarray(needle, haystack)
getView(vec, i, len) = view(vec, i:i+len-1)
ithview(i) = getView(haystack, i, length(needle))
return any(i -> ithview(i) == needle, 1:length(haystack)-length(needle)+1)
end
This is lightening fast and requires almost no memory - Julia's view
is lightweight and efficient. And, as always with Julia, the solution is generally to simply define more functions.
Upvotes: 2
Reputation: 4137
Here is a more up-to-date implementation using findall
function issubsequence(A, B)
B1inA = findall(isequal(B[1]), A) # indices of the first element of b occuring in a
matchFirstIndex = [] # Saves the first index in A of the occurances
for i in B1inA
if length(A[i:end]) < length(B) continue end
if A[i:i + length(B) - 1] == B
push!(matchFirstIndex, i)
end
end
return matchFirstIndex
end
I get a similar runtime to @daycaster
@time issubsequence(rand(1:9, 1000), [2,3])
0.000038 seconds (111 allocations: 20.844 KiB)
7-element Vector{Any}:
57
427
616
644
771
855
982
Upvotes: 0
Reputation: 31
note that you can now vectorize in
with a dot:
julia> in([4,6,5]).([4, 6])
2-element BitArray{1}:
true
true
and chain with all
to get your answer:
julia> all(in([4,6,5]).([4, 6]))
true
Upvotes: 3
Reputation: 1109
I think it is worth mentioning that in Julia 1.0 you have the function issubset
> issubset([4,6], [4,6,5])
true
You can also quite conveniently call it using the \subseteq
latex symbol
> [4,6] ⊆ [4,6,5]
true
This looks pretty optimized to me:
> using Random
> x, y = randperm(10^3)[1:10^2], randperm(10^3);
> @btime issubset(x, y);
16.153 μs (12 allocations: 45.96 KiB)
Upvotes: 12
Reputation: 2697
I used this recently to find subsequences in arrays of integers. It's not as good or as fast as @scott's subset2(x,y)
... but it returns the indices.
function findsequence(arr::Array{Int64}, seq::Array{Int64})
indices = Int64[]
i = 1
n = length(seq)
if n == 1
while true
occurrence = findnext(arr, seq[1], i)
if occurrence == 0
break
else
push!(indices, occurrence)
i = occurrence +1
end
end
else
while true
occurrence = Base._searchindex(arr, seq, i)
if occurrence == 0
break
else
push!(indices, occurrence)
i = occurrence +1
end
end
end
return indices
end
julia> @time findsequence(rand(1:9, 1000), [2,3])
0.000036 seconds (29 allocations: 8.766 KB)
16-element Array{Int64,1}:
80
118
138
158
234
243
409
470
539
589
619
629
645
666
762
856
Upvotes: 1
Reputation: 1750
It takes a little bit of code to make a function that performs well, but this is much faster than the issubvec
version above:
function subset2(x,y)
lenx = length(x)
first = x[1]
if lenx == 1
return findnext(y, first, 1) != 0
end
leny = length(y)
lim = length(y) - length(x) + 1
cur = 1
while (cur = findnext(y, first, cur)) != 0
cur > lim && break
beg = cur
@inbounds for i = 2:lenx
y[beg += 1] != x[i] && (beg = 0 ; break)
end
beg != 0 && return true
cur += 1
end
false
end
Note: it would also be much more useful if the function actually returned the position of the beginning of the subarray if found, or 0 if not, similarly to the findfirst/findnext functions.
Timing information (the second one is using my subset2 function):
0.005273 seconds (65.70 k allocations: 4.073 MB)
0.000086 seconds (4 allocations: 160 bytes)
Upvotes: 7
Reputation: 18217
For the third condition i.e. vector [4,6]
appears as a sub-vector of 4,6,5
the following function is suggested:
issubvec(v,big) =
any([v == slice(big,i:(i+length(v)-1)) for i=1:(length(big)-length(v)+1)])
For the second condition, that is, give a boolean for each element in els
vectors which appears in set
vector, the following is suggested:
function vecin(els,set)
res = zeros(Bool,size(els))
res[findin(els,set)]=true
res
end
With the vector in the OP, these result in:
julia> vecin([4,6],[4,6,5])
2-element Array{Bool,1}:
true
true
julia> issubvec([4,6],[4,6,5])
true
Upvotes: 6