Reputation: 309
I have a zarr of daily weather data focusing on France that seems in good state, and for which ds.chunksizes gives
Frozen({'time': (49, 49, 49, 49, 49, 49, 49, 49, 49, 44), 'latitude': (105,), 'longitude': (161,)})
I append 32 days of data with:
# this will output a chunksize of 8 on time anyway, the size of the file (8 days)
ds = ds.chunk(chunks={"time": 49, "latitude": 105, "longitude": 161})
with Client(LocalCluster(processes=False, n_workers=1, threads_per_worker=1)):
ds.to_zarr(mapper, mode="a", consolidated=True, append_dim="time", safe_chunks=True)
so that I get :
Frozen({'time': (49, 49, 49, 49, 49, 49, 49, 49, 49, 49, 27), 'latitude': (105,), 'longitude': (161,)})
but the data that was there before the append get flipped along the latitude, while the new data is ok.
As it may not be obvious, here is France, Spain, Corsica and UK:
The data that is appended is good (I checked the netcdf). I use zarr==2.17.2, fsspec>=2024.2.0, s3fs>=2024.2.0, xarray==2024.3.0 and I have the same results with zarr==2.16.1 and xarray==2023.08.0
The zarr to which I append was built with:
compressor = zarr.Blosc(cname="zstd", clevel=3)
encoding = {vname: {"compressor": compressor} for vname in data.data_vars}
data.to_zarr(mapper, mode="w", encoding=encoding, consolidated=True)
In the same script run, I initialize the file like this with 10 days then switch to the append version and these append worked.
I tried to rechunk (to the same chunksizes) the original file before the append:
ds = xr.open_zarr(mapper)
ds = ds.sortby(['time','latitude','longitude'])
ds = ds.assign_coords(longitude=([lon - 360 if lon > 180 else lon for lon in ds.longitude]))
ds = ds.sortby('longitude')
ds = ds.squeeze(drop=True)
ds = ds.unify_chunks()
I tried to rechunk (to the same chunksizes) both the resulting file and the initial file with a code that uses the rechunker package.
But I always get these flipped data.
On the contrary, if I create a new zarr of size
Frozen({'time': (44,), 'latitude': (105,), 'longitude': (161,)})
with new downloaded data, and I append on it, with the same code, then I have no problem.
Now if I create a new zarr by extracting just the 2 last chunk from the initial data, hence with
Frozen({'time': (49, 44), 'latitude': (105,), 'longitude': (161,)})
Then I get the latitude flip again
Now if I create a new zarr by extracting just the last chunk from the initial data, hence with
Frozen({'time': (44), 'latitude': (105,), 'longitude': (161,)})
Then I get the latitude flip too.
So my data seem to be corrupted, but I already re-downloaded all data after I observed this flipped latitude for the first time, so how to detect the problem in order to correct it during the download/zarrification process ?
It may be related to xarray to_netcdf group flipping latitude
Thank you
Upvotes: 1
Views: 64