Robson Barreto
Robson Barreto

Reputation: 151

Optimize plane of array (POA) irradiance calculation using WRF (netCDF) data

I need to calculate the plane of array (POA) irradiance using python's pvlib package (https://pvlib-python.readthedocs.io/en/stable/). For this I would like to use the output data from the WRF model (GHI, DNI, DHI). The output data is in netCDF format, which I open using the netCDF4 package and then I extract the necessary variables using the wrf-python package.

With that I get a xarray.Dataset with the variables I will use. I then use the xarray.Dataset.to_dataframe() method to transform it into a pandas dataframe, and then I transform the dataframe into a numpy array using the dataframe.values. And then I do a loop where in each iteration I calculate the POA using the function irradiance.get_total_irradiance (https://pvlib-python.readthedocs.io/en/stable/auto_examples/plot_ghi_transposition.html) for a grid point.

That's the way I've been doing it so far, however I have over 160000 grid points in the WRF domain, the data is hourly and spans 365 days. This gives a very large amount of data. I believe if pvlib could work directly with xarray.dataset it could be faster. However, I could only do it this way, transforming the data into a numpy.array and looping through the rows. Could anyone tell me how I can optimize this calculation? Because the code I developed is very time-consuming.

If anyone can help me with this I would appreciate it. Maybe an improvement to the code, or another way to calculate the POA from the WRF data...

I'm providing the code I've built so far:

from pvlib import location
from pvlib import irradiance

import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import netCDF4
import wrf

Getting WRF data

variaveis = ['T2',
             'U10',
             'V10',
             'SWDDNI',
             'SWDDIF',
             'SWDOWN']

netcdf_data = netCDF4.Dataset('wrfout_d02_2003-11-01_00_00_00')

first = True

for v in variaveis:

    var = wrf.getvar(netcdf_data, v, timeidx=wrf.ALL_TIMES)
    
    if first:
        met_data = var
        first = False
    else:
        met_data = xr.merge([met_data, var])

met_data = xr.Dataset.reset_coords(met_data, ['XTIME'], drop=True)
met_data['T2'] = met_data['T2'] - 273.15

WS10 = (met_data['U10']**2 + met_data['V10']**2)**0.5
met_data['WS10'] = WS10

df = met_data[['SWDDIF', 
               'SWDDNI', 
               'SWDOWN', 
               'T2', 
               'WS10']].to_dataframe().reset_index().drop(columns=['south_north', 
                                                                   'west_east'])

df.rename(columns={'SWDOWN': 'ghi',
                   'SWDDNI':'dni', 
                   'SWDDIF':'dhi', 
                   'T2':'temp_air', 
                   'WS10':'wind_speed',
                   'XLAT': 'lat',
                   'XLONG': 'lon',
                   'Time': 'time'}, inplace=True)
df.set_index(['time'], inplace=True)

df = df[df.ghi>0]
df.index = df.index.tz_localize('America/Recife')

Function to get POA irradiance

def get_POA_irradiance(lon, lat, date, dni, dhi, ghi, tilt=10, surface_azimuth=0):

    site_location = location.Location(lat, lon, tz='America/Recife')

    # Get solar azimuth and zenith to pass to the transposition function
    solar_position = site_location.get_solarposition(times=date)
    
    # Use the get_total_irradiance function to transpose the GHI to POA
    POA_irradiance = irradiance.get_total_irradiance(
        surface_tilt = tilt,
        surface_azimuth = surface_azimuth,
        dni = dni,
        ghi = ghi,
        dhi = dhi,
        solar_zenith = solar_position['apparent_zenith'],
        solar_azimuth = solar_position['azimuth'])
    
    # Return DataFrame with only GHI and POA
    
    return pd.DataFrame({'lon': lon,
                         'lat': lat,
                         'GHI': ghi,
                         'POA': POA_irradiance['poa_global']}, index=[date])

Loop in each row (time) of the array

array = df.reset_index().values
    
list_poa = []
    
def loop_POA():   
    for i in tqdm(range(len(array) - 1)):
        POA = get_POA_irradiance(lon=array[i,6], 
                                 lat=array[i,7], 
                                 dni=array[i,2], 
                                 dhi=array[i,1], 
                                 ghi=array[i,3], 
                                 date=str(array[i,0]))
        list_poa.append(POA)
    
    return list_poa

poa_final = pd.concat(lista)

Upvotes: 4

Views: 599

Answers (1)

kevinsa5
kevinsa5

Reputation: 3411

Thanks both for a good question and for using pvlib! You're right that pvlib is intended for modeling single locations and is not designed for use with xarray datasets, although some functions might coincidentally work with them.

I strongly suspect that the majority of the runtime you're seeing is for the solar position calculations. You could switch to a faster method (see the method options here), as the default solar position method is very accurate but also quite slow when calculating bulk positions. Installing numba will help, but it still might be too slow for you, so you might check the other models (ephemeris, pyephem). There are also some fast but low-precision methods, but you will need to change your code a bit to use them. See the list under "Correlations and analytical expressions for low precision solar position calculations" here.

Like Michael Delgado suggests in the comments, parallel processing is an option. But that can be a headache in python. You will probably want multiprocessing, not multithreading.

Another idea is to use atlite, a python package designed for this kind of spatial modeling. But its solar modeling capabilities are not nearly as detailed as pvlib, so it might not be useful for your case.

One other note: I don't know if the WRF data are interval averages or instantaneous values, but if you care about accuracy you should handle them differently for transposition. See this example.

Edit to add: after looking at your code again, there might be another significant speedup to be had. Are you calling get_POA_irradiance for single combinations of position and timestamp? If so, that is unnecessary and very slow. It would be much faster to pass in the full time series for each location, i.e. scalar lat/lon but vector irradiance.

Upvotes: 2

Related Questions