Reputation: 337
Main problem during this process is the code below:
precip[:] = orig
Produces an error of:
ValueError: cannot reshape array of size 5732784 into shape (39811,144,144)
I have two CSV files, one of the CSV file contains all the actual data of a variable (precipitation), with each column as a station, and their corresponding coordinates is in the second separate CSV file. My sample data is in google drive here.
If you want to have a look at the data itself, but my 1st CSV file has the shape (39811, 144) and 2nd CSV file has the shape (171, 10) but note; I'm only using the sliced dataframe as (144, 2).
This is the code:
stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])
lons = stncoords['X']
lats = stncoords['Y']
ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')
ncout.createDimension('longitude',lons.shape[0])
ncout.createDimension('latitude',lats.shape[0])
ncout.createDimension('precip',orig.shape[1])
ncout.createDimension('time',orig.shape[0])
lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()
lats = ncout.createVariable('latitude',np.dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('longitude',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'longitude','latitude'))
lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()
I'm mostly basing my code to this post: convert-csv-to-netcdf but does not include the variable 'TIME' as a 3rd dimension, so that's where I'm failing. I think I should be expecting the precipitation variable to have a shape in the form (39811, 144, 144), but the error suggests otherwise.
Not exactly sure how to deal with this, any inputs are appreciated.
Upvotes: 0
Views: 1826
Reputation: 2078
As you have data from different stations, I would suggest using dimension station
for your netCDF file and not separate lon
and lat
. Of course, you can save the longitude and latitude of each station to separate variable.
Here is one possible solution, using your code as an example:
#!/usr/bin/env ipython
import pandas as pd
import numpy as np
import netCDF4
stn_precip='Precip_1910-2018_stations.csv'
orig_precip='Precip_1910-2018_origvals.csv'
stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])
lons = stncoords['X']
lats = stncoords['Y']
nstations = np.size(lons)
ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')
ncout.createDimension('station',nstations)
ncout.createDimension('time',orig.shape[0])
lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()
lats = ncout.createVariable('latitude',np.dtype('float32').char,('station',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('station',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'station'))
lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()
So the information about output file (ncdump -h Precip_1910-2018_homomod.nc
) is like this:
Upvotes: 1