Reputation: 1923
Suppose I have a converted a simple to column dataframe to a numpy array:
gdf.head()
>>>
rid rast
0 1 01000001000761C3ECF420013F0761C3ECF42001BF7172...
1 2 01000001000761C3ECF420013F0761C3ECF42001BF64BF...
2 3 01000001000761C3ECF420013F0761C3ECF42001BF560C...
3 4 01000001000761C3ECF420013F0761C3ECF42001BF7F25...
4 5 01000001000761C3ECF420013F0761C3ECF42001BF7172...
raster_np = gdf.to_numpy()
raster_np[0][0]
>>> array([1, '01000001000761C3E.........], dtype=object))
I've been tasked with converting the numpy array to a Zarr
file format (because of the size of the rast
values and the size of the dataframe, chunking and compression might be necessary and the new .zarr files could be utilized better on an S3/cloud storage environment, I assume). I created a simple Zarr
array like so:
z_test = z.zeros(shape=(10000, 2), chunks=(10000, 2))
z_test
>>> <zarr.core.Array (10000, 2) float64>
Now, how do I get the data in raster_np
into z_test
and retain the Zarr
attributes? Simply using z_test = raster_np
obviously doesn't work. Perhaps there is something I am misunderstanding about Zarr
. Any suggestions?
Upvotes: 3
Views: 1894
Reputation: 2055
z_test = zarr.array(raster_np)
See https://zarr.readthedocs.io/en/stable/api/creation.html#zarr.creation.array
and https://zarr.readthedocs.io/en/stable/api/hierarchy.html#zarr.hierarchy.Group.array
Upvotes: 0
Reputation: 2948
Since your initial array is of mixed type (object) you need to create the zarr array with the correct data type, and encode the data. You can use the JSON encoder from numcodecs
import numcodecs
z_test = zarr.zeros(shape=(10000, 2), dtype=object, object_codec=numcodecs.JSON())
z_test[:] = raster_np
You will however have better performance if you store the rid
and raster
column as separate arrays with int
and str
datatypes respectively, or convert the hex to another basis.
Upvotes: 1