Reputation: 1
I’m stuck on a question about extracting a variable from a NetCDF file. Here is the structure of my input netcdf file:
comephore_all Out[37]: <xarray.Dataset> Dimensions: (x: 85, y: 99, time: 236664) Coordinates:
- x (x) float64 1.156e+06 1.158e+06 1.158e+06 ... 1.24e+06 1.24e+06
- y (y) float64 5.075e+05 5.065e+05 5.055e+05 ... 4.105e+05 4.095e+05
- time (time) object '1997-01-01 00:00' ... '2023-12-31 23:00' Data variables: rr1 (time, y, x) float32 dask.array<chunksize=(201600, 99, 85), meta=np.ndarray> crs (time) int32 -2147483647 -2147483647 ... -2147483647 -2147483647 Attributes: title: Réanalyse des lames d'eau COMEPHORE Conventions: CF-1.6 history: Wed Mar 24 13:39:53 2021: ncrcat 1997_rr.nc 19... nco_openmp_thread_number: 1
I am looking to extract precipitation values for each station into separated .csv files.
I ran an initial test to extract the values for a single station, but it’s very slow. Here is a snippet of the code:
indice_x = 1205500
indice_y = 439500
precip_1205500_439500 = comephore_all.sel(x=indice_x, y=indice_y)
df_precip = precip_1205500_439500.to_dataframe(name='EH')[['EH','date']]
df_precip[['date', 'EH']].to_csv(output_path, index=False)
The issue is not with writing the CSV file but with extracting the precipitation values. The line "df_precip = precip_1205500_439500.to_dataframe(name='EH')[['EH', 'date']]" takes 40 minutes for a single station.
I also tried with dask :
df_precip = precip_1205500_439500.to_dask_dataframe()
df_precip=df_precip[['time', 'rr1']]
df_precip[['time', 'rr1']].to_csv(output_path)
But the line df_precip[['time', 'rr1']].to_csv(output_path)
is taking an infinitely long time to execute.
I think I’m not approaching the extraction of precipitation values correctly.
Thank you for your help!
Upvotes: 0
Views: 44
Reputation: 196
I think you could make the process faster by skipping the conversion to a DataFrame and directly writing the data into a CSV file. This could look something like this:
with open(output_path, "w") as f:
f.write("date,EH\n")
for i in range(precip_1205500_439500.date.size):
f.write(f"{precip_1205500_439500.date[i]:f},{precip_1205500_439500.EH[i]:f}\n")
You mentioned that you want to do this for every location, so I assume you will loop over all locations. In this case, you might want to switch to index-based selecting (using isel
or square brackets instead of sel
) for even more speed.
I hope this helps to speed up your data extraction.
Cheers, Markus
Upvotes: -1