Reputation: 151
I need to calculate the plane of array (POA) irradiance using python's pvlib package (https://pvlib-python.readthedocs.io/en/stable/). For this I would like to use the output data from the WRF model (GHI, DNI, DHI). The output data is in netCDF format, which I open using the netCDF4 package and then I extract the necessary variables using the wrf-python package.
With that I get a xarray.Dataset with the variables I will use. I then use the xarray.Dataset.to_dataframe() method to transform it into a pandas dataframe, and then I transform the dataframe into a numpy array using the dataframe.values. And then I do a loop where in each iteration I calculate the POA using the function irradiance.get_total_irradiance (https://pvlib-python.readthedocs.io/en/stable/auto_examples/plot_ghi_transposition.html) for a grid point.
That's the way I've been doing it so far, however I have over 160000 grid points in the WRF domain, the data is hourly and spans 365 days. This gives a very large amount of data. I believe if pvlib could work directly with xarray.dataset it could be faster. However, I could only do it this way, transforming the data into a numpy.array and looping through the rows. Could anyone tell me how I can optimize this calculation? Because the code I developed is very time-consuming.
If anyone can help me with this I would appreciate it. Maybe an improvement to the code, or another way to calculate the POA from the WRF data...
I'm providing the code I've built so far:
from pvlib import location
from pvlib import irradiance
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import netCDF4
import wrf
Getting WRF data
variaveis = ['T2',
'U10',
'V10',
'SWDDNI',
'SWDDIF',
'SWDOWN']
netcdf_data = netCDF4.Dataset('wrfout_d02_2003-11-01_00_00_00')
first = True
for v in variaveis:
var = wrf.getvar(netcdf_data, v, timeidx=wrf.ALL_TIMES)
if first:
met_data = var
first = False
else:
met_data = xr.merge([met_data, var])
met_data = xr.Dataset.reset_coords(met_data, ['XTIME'], drop=True)
met_data['T2'] = met_data['T2'] - 273.15
WS10 = (met_data['U10']**2 + met_data['V10']**2)**0.5
met_data['WS10'] = WS10
df = met_data[['SWDDIF',
'SWDDNI',
'SWDOWN',
'T2',
'WS10']].to_dataframe().reset_index().drop(columns=['south_north',
'west_east'])
df.rename(columns={'SWDOWN': 'ghi',
'SWDDNI':'dni',
'SWDDIF':'dhi',
'T2':'temp_air',
'WS10':'wind_speed',
'XLAT': 'lat',
'XLONG': 'lon',
'Time': 'time'}, inplace=True)
df.set_index(['time'], inplace=True)
df = df[df.ghi>0]
df.index = df.index.tz_localize('America/Recife')
Function to get POA irradiance
def get_POA_irradiance(lon, lat, date, dni, dhi, ghi, tilt=10, surface_azimuth=0):
site_location = location.Location(lat, lon, tz='America/Recife')
# Get solar azimuth and zenith to pass to the transposition function
solar_position = site_location.get_solarposition(times=date)
# Use the get_total_irradiance function to transpose the GHI to POA
POA_irradiance = irradiance.get_total_irradiance(
surface_tilt = tilt,
surface_azimuth = surface_azimuth,
dni = dni,
ghi = ghi,
dhi = dhi,
solar_zenith = solar_position['apparent_zenith'],
solar_azimuth = solar_position['azimuth'])
# Return DataFrame with only GHI and POA
return pd.DataFrame({'lon': lon,
'lat': lat,
'GHI': ghi,
'POA': POA_irradiance['poa_global']}, index=[date])
Loop in each row (time) of the array
array = df.reset_index().values
list_poa = []
def loop_POA():
for i in tqdm(range(len(array) - 1)):
POA = get_POA_irradiance(lon=array[i,6],
lat=array[i,7],
dni=array[i,2],
dhi=array[i,1],
ghi=array[i,3],
date=str(array[i,0]))
list_poa.append(POA)
return list_poa
poa_final = pd.concat(lista)
Upvotes: 4
Views: 599
Reputation: 3411
Thanks both for a good question and for using pvlib! You're right that pvlib is intended for modeling single locations and is not designed for use with xarray datasets, although some functions might coincidentally work with them.
I strongly suspect that the majority of the runtime you're seeing is for the solar position calculations. You could switch to a faster method (see the method
options here), as the default solar position method is very accurate but also quite slow when calculating bulk positions. Installing numba will help, but it still might be too slow for you, so you might check the other models (ephemeris, pyephem). There are also some fast but low-precision methods, but you will need to change your code a bit to use them. See the list under "Correlations and analytical expressions for low precision solar position calculations" here.
Like Michael Delgado suggests in the comments, parallel processing is an option. But that can be a headache in python. You will probably want multiprocessing, not multithreading.
Another idea is to use atlite, a python package designed for this kind of spatial modeling. But its solar modeling capabilities are not nearly as detailed as pvlib, so it might not be useful for your case.
One other note: I don't know if the WRF data are interval averages or instantaneous values, but if you care about accuracy you should handle them differently for transposition. See this example.
Edit to add: after looking at your code again, there might be another significant speedup to be had. Are you calling get_POA_irradiance
for single combinations of position and timestamp? If so, that is unnecessary and very slow. It would be much faster to pass in the full time series for each location, i.e. scalar lat/lon but vector irradiance.
Upvotes: 2