Mael
Mael

Reputation: 1322

merge two sorted arrays in julia

Is there a neat function in julia which will merge two sorted arrays and return the sorted array for me? I have written:

c=1
p=1
i=1
n=length(tc)+length(tp)
t=Array{Float64}(n)
while(c<=length(tc) && p<=length(tp))
    if(tp[p]<tc[c])
        t[i]=tp[p]
        p=p+1;
        i=i+1;
    else
        t[i]=tc[c]
        c=c+1;
        i=i+1;
    end
end
while(p<=length(tp))
    t[i]=tp[p]
    i=i+1
    p=p+1
end
while(c<=length(tc))
    t[i]=tc[c]
    i=i+1
    c=c+1
end

but is there no native function in base julia to do this?

Upvotes: 6

Views: 1699

Answers (4)

Colin T Bowers
Colin T Bowers

Reputation: 18560

Contrary to the other answers, there is in fact a method to do this in base Julia. BUT, it only works for arrays of integers, AND it will only work if the arrays are unique (in the sense that no integer is repeated in either array). Simply use the IntSet type as follows:

a = [2, 3, 4, 8]
b = [1, 5]
union(IntSet(a), IntSet(b))

If you run the above code, you'll note that the union function removes duplicates from the output, which is why I stated initially that your arrays must be unique (or else you must be happy to have duplicates removed in the output). You'll also notice that the union operation on the IntSet works much faster than union on a sorted Vector{Int}, since the former exploits the fact that an IntSet is pre-sorted.

Of course, the above is not really in the spirit of the question, which more concerns a solution for any type for which the lt operator is defined, as well as allowing for duplicates.

Here is a function that efficiently finds the union of two pre-sorted unique vectors. I've never had a need for the non-unique case myself so have not written a function that covers that case I'm afraid:

"union <- Return the union of the inputs as a new sorted vector"
function union_vec(x::Vector{T}, y::Vector{T})::Vector{T} where {T}
    (nx, ny) = (1, 1)
    z = T[]
    while nx <= length(x) && ny <= length(y)
        if x[nx] < y[ny]
            push!(z, x[nx])
            nx += 1
        elseif y[ny] < x[nx]
            push!(z, y[ny])
            ny += 1
        else
            push!(z, x[nx])
            nx += 1
            ny += 1
        end
    end
    if nx <= length(x)
        [ push!(z, x[n]) for n = nx:length(x) ]
    elseif ny <= length(y)
        [ push!(z, y[n]) for n = ny:length(y) ]
    end
    return z
end

Another option is to look at sorted dictionaries, available in the DataStructures.jl package. I haven't done it myself, but a method that just inserts all observations into a sorted dictionary (checking for key duplication as you go) and then iterates over (keys, values) should also be a fairly efficient way to attack this problem.

Upvotes: 4

vvjn
vvjn

Reputation: 106

I keep coming across this in different projects, so I made a package MergeSorted (https://github.com/vvjn/MergeSorted.jl). You can use it as follows.

using MergeSorted

a = sort!(rand(1000))
b = sort!(rand(1000))
c = mergesorted(a,b)
sort!(vcat(a,b)) == c 

Or without allocating new memory.

mergesorted!(c, a, b)

You can also use all of the sort options.

a = sort!(rand(1000), order=Base.Reverse)
b = sort!(rand(1000), order=Base.Reverse)
c = mergesorted(a,b, order=Base.Reverse)
sort!(vcat(a,b), order=Base.Reverse) == c 

It's around 4-6 times faster than sort!(vcat(a,b)), which uses QuickSort by default, and twice as fast as sort!(vcat(a,b), alg=MergeSort) but MergeSort uses more memory.

Upvotes: 3

Dan Getz
Dan Getz

Reputation: 18217

Although an explicit function to merge two sorted vectors seems to be missing, one can be constructed easily from the existing building blocks (the question actually demonstrated this, but it doesn't define a function).

The following method tries to leverage the existing sort code and still remain efficient.

In code:

mergesorted(a,b) = sort!(vcat(a,b))

The following is an example:

julia> a = [1:2:11...];

julia> b = [2:3:20...];

julia> show(a)
[1,3,5,7,9,11]
julia> show(b)
[2,5,8,11,14,17,20]
julia> show(mergesorted(a,b))
[1,2,3,5,5,7,8,9,11,11,14,17,20]

I didn't benchmark the function, but QuickSort (the default sort algorithm) is usually good performing on pre-sorted arrays, so it should be OK and the allocation of a result vector is required in any implementation.

Upvotes: 3

Salvador Dali
Salvador Dali

Reputation: 222581

No, such function does not exist. And actually I have not seen a language which has such function out of the box.

To do this, you have to maintain two pointers in each of the arrays, compare the values and move the smaller (based on what I see, this is exactly what you do).

Upvotes: 2

Related Questions