jjjjjj
jjjjjj

Reputation: 1172

Save array of arrays, HDF5, Julia

Cross-posted here, but how can I save an array of arrays in Julia using HDF5?

In my particular case I have a single array containing 10,000 arrays of varying lengths. I'd like the 10,000 arrays to be part of a "group", but creating new datasets / groups for each array makes reading the file very slow, so I am seeking an alternative.

Upvotes: 2

Views: 1139

Answers (1)

Maurice van Leeuwen
Maurice van Leeuwen

Reputation: 191

You can flatten the array of arrays to one single array where one column contains the original data, and another column denotes which i-th array this data was originally from.

using HDF5
# Define your array of arrays.
arr = [[1,2],[3,4,5]]

# Open your hdf5 file
h5open("data.hdf5", "w") do f
    # Create a dataset with the length of all your arrays combined.
    N = sum(length.(arr))
    d_create(f, "X", Int, ((2,N),(2,-1)), "chunk", (1,1000))

    n = 1
    for i in 1:length(arr)
        m = length(arr[i])
        f["X"][1, n:n+m-1] = fill(i, m)
        f["X"][2, n:n+m-1] = arr[i]
        n+=m
    end
    print(f["X"][:,:])
end

Then arrays are then stored as follows:

> [1 1 2 2 2; 1 2 3 4 5]

Upvotes: 2

Related Questions