Megan Martin
Megan Martin

Reputation: 221

How to find daily percentiles of gridded data with xarray?

I have a netcdf file of 10 years of gridded daily temperature data for the United States. I created a baseline period of just the first 5 years of data. I now want to find the 90th percentile for each day of that baseline period using all 5 years of data for each grid point (i.e. the 90th percentile of Jan 1, Jan 2, Jan 3, etc for every grid point). I tried applying the quantile function but don't think I'm using it correctly.

Here's what my dataset looks like: enter image description here

and here's what my code looks like:

#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import requests
from datetime import date

#open NOAA gridded temperature netcdf file
df = xr.open_dataset('Tmax_1951-1960.nc')

#pull out maximum temperature variable
air=df.tmax

#select years up to and including 1955 for baseline period
Baseline=air[(air.time.dt.year <= 1955)]

#create year and day coordinates
Baseline['year']=Baseline.time.dt.year
Baseline['day']=Baseline.time.dt.strftime('%m-%d')

#calculate percentiles
Baseline['Percentile_90']=Baseline.quantile(0.9, dim='day')

But I get the error "ValueError: Dataset does not contain the dimensions: ['day']". How can I find the 90th percentile for each calendar day for each grid point?

Upvotes: 2

Views: 1142

Answers (1)

Megan Martin
Megan Martin

Reputation: 221

I needed to use groupby before applying the percentile calculation. I created a new coordinate because I had leap years and couldn't use dayofyear.

#import libraries
import pandas as pd
import json
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import requests
from datetime import date
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.patches as patches
import datetime as dt

#open NASA GISS gridded temperature netcdf file
df = xr.open_dataset('Tmax_1951-1960.nc')

#select temperature dataset
air=df.tmax

#Create baseline period
Baseline=air.loc[air.time <= np.datetime64('1955-01-01')]

#create new monthday coordinate
monthday = xr.DataArray(Baseline.time.dt.month*100+Baseline.time.dt.day,name='monthday', dims='time', coords={'time':Baseline['time']})
Baseline['monthday'] = monthday

#Find 90th percentile of daily data
Per90 = Baseline.groupby('monthday').quantile(0.9)

Upvotes: 2

Related Questions