Jared
Jared

Reputation: 616

Get only elements of one array that are in another array

I'm learning Julia coming from Python. I want to get the elements of an array b such that each element is in array a. My attempt in Julia is shown after doing what I need in python. My question is this: is there a better/faster way to do this in Julia? I'm suspicious about the simplicity of what I've written in Julia, and I worry that such a naive looking solution might have suboptimal performance (again coming from Python).

Python:

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([7, 8, 2, 3, 5])
indices_b_in_a = np.nonzero(np.isin(b, a))
b_in_a = b[indices_b_in_a]
# array([2, 3])

Julia:

a = [1, 2, 3, 4];
b = [7, 8, 2, 3, 5];
indices_b_in_a = findall(ele -> ele in a, b);
b_in_a = b[indices_b_in_a];
#2-element Vector{Int64}:
# 2
# 3

Upvotes: 3

Views: 913

Answers (2)

Shayan
Shayan

Reputation: 6295

Maybe this would be a helpful answer:

julia> intersect(Set(a), Set(b))
Set{Int64} with 2 elements:
  2
  3

# Or even
julia> intersect(a, b)
2-element Vector{Int64}:
 2
 3

Note that if you had repetitive numbers, this method fails to exactly replicate your expected behavior since I'm working on unique values here! If you have repetitive elements, there should replace an element-by-element approach for searching! in that case, using binary search would be a good choice.
Another approach is using broadcasting in Julia:

julia> a = rand(1:100, 1000);
       b = rand(1:3000, 5000);

julia> b[in.(b, Ref(a))]
161-element Vector{Int64}:
  8
  5
 70
 73
  ⋮

# Exactly the same approach with a slightly different syntax
julia> b[b.∈Ref(a)]
161-element Vector{Int64}:
  8
  5
 70
 73
 30
 63
 73
  ⋮

Q: What is the role of Ref in the above code block?
Ans: By wrapping a in Ref, I make a Reference of a and prevent the compiler from iterating through a as well within the broadcasting procedure. Otherwise, it would try to iterate on the elements of a and b simultaneously which is not the right solution (even if both objects hold the same length).
However, Julia's syntax is specific (typically), but it's not that complicated. I said this because you mentioned:

I worry that such a naive looking solution...

Last but not least, do not forget to wrap your code in a function if you want to obtain a good performance in Julia.

Upvotes: 5

Andre Wildberg
Andre Wildberg

Reputation: 19088

Another approach using array comprehensions.

julia> [i for i in a for j in b if i == j]
2-element Vector{Int64}:
 2
 3

Upvotes: 2

Related Questions