Lucie Armand
Lucie Armand

Reputation: 1

Extract precipitation values from netcfd

I’m stuck on a question about extracting a variable from a NetCDF file. Here is the structure of my input netcdf file:

comephore_all Out[37]: <xarray.Dataset> Dimensions: (x: 85, y: 99, time: 236664) Coordinates:

  • x (x) float64 1.156e+06 1.158e+06 1.158e+06 ... 1.24e+06 1.24e+06
  • y (y) float64 5.075e+05 5.065e+05 5.055e+05 ... 4.105e+05 4.095e+05
  • time (time) object '1997-01-01 00:00' ... '2023-12-31 23:00' Data variables: rr1 (time, y, x) float32 dask.array<chunksize=(201600, 99, 85), meta=np.ndarray> crs (time) int32 -2147483647 -2147483647 ... -2147483647 -2147483647 Attributes: title: Réanalyse des lames d'eau COMEPHORE Conventions: CF-1.6 history: Wed Mar 24 13:39:53 2021: ncrcat 1997_rr.nc 19... nco_openmp_thread_number: 1

I am looking to extract precipitation values for each station into separated .csv files.

I ran an initial test to extract the values for a single station, but it’s very slow. Here is a snippet of the code:

indice_x = 1205500
indice_y = 439500
precip_1205500_439500 = comephore_all.sel(x=indice_x, y=indice_y)
df_precip = precip_1205500_439500.to_dataframe(name='EH')[['EH','date']]
df_precip[['date', 'EH']].to_csv(output_path, index=False)

The issue is not with writing the CSV file but with extracting the precipitation values. The line "df_precip = precip_1205500_439500.to_dataframe(name='EH')[['EH', 'date']]" takes 40 minutes for a single station.

I also tried with dask :

df_precip = precip_1205500_439500.to_dask_dataframe()
df_precip=df_precip[['time', 'rr1']]
df_precip[['time', 'rr1']].to_csv(output_path)

But the line df_precip[['time', 'rr1']].to_csv(output_path) is taking an infinitely long time to execute.

I think I’m not approaching the extraction of precipitation values correctly.

Thank you for your help!

Upvotes: 0

Views: 44

Answers (1)

Markus
Markus

Reputation: 196

I think you could make the process faster by skipping the conversion to a DataFrame and directly writing the data into a CSV file. This could look something like this:

with open(output_path, "w") as f:
    f.write("date,EH\n")
    for i in range(precip_1205500_439500.date.size):
        f.write(f"{precip_1205500_439500.date[i]:f},{precip_1205500_439500.EH[i]:f}\n")

You mentioned that you want to do this for every location, so I assume you will loop over all locations. In this case, you might want to switch to index-based selecting (using isel or square brackets instead of sel) for even more speed.

I hope this helps to speed up your data extraction.

Cheers, Markus

Upvotes: -1

Related Questions