Reputation: 221
I have a netcdf file of 10 years of gridded daily temperature data for the United States. I created a baseline period of just the first 5 years of data. I now want to find the 90th percentile for each day of that baseline period using all 5 years of data for each grid point (i.e. the 90th percentile of Jan 1, Jan 2, Jan 3, etc for every grid point). I tried applying the quantile function but don't think I'm using it correctly.
Here's what my dataset looks like:
and here's what my code looks like:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import requests
from datetime import date
#open NOAA gridded temperature netcdf file
df = xr.open_dataset('Tmax_1951-1960.nc')
#pull out maximum temperature variable
air=df.tmax
#select years up to and including 1955 for baseline period
Baseline=air[(air.time.dt.year <= 1955)]
#create year and day coordinates
Baseline['year']=Baseline.time.dt.year
Baseline['day']=Baseline.time.dt.strftime('%m-%d')
#calculate percentiles
Baseline['Percentile_90']=Baseline.quantile(0.9, dim='day')
But I get the error "ValueError: Dataset does not contain the dimensions: ['day']". How can I find the 90th percentile for each calendar day for each grid point?
Upvotes: 2
Views: 1142
Reputation: 221
I needed to use groupby
before applying the percentile calculation. I created a new coordinate because I had leap years and couldn't use dayofyear
.
#import libraries
import pandas as pd
import json
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import requests
from datetime import date
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.patches as patches
import datetime as dt
#open NASA GISS gridded temperature netcdf file
df = xr.open_dataset('Tmax_1951-1960.nc')
#select temperature dataset
air=df.tmax
#Create baseline period
Baseline=air.loc[air.time <= np.datetime64('1955-01-01')]
#create new monthday coordinate
monthday = xr.DataArray(Baseline.time.dt.month*100+Baseline.time.dt.day,name='monthday', dims='time', coords={'time':Baseline['time']})
Baseline['monthday'] = monthday
#Find 90th percentile of daily data
Per90 = Baseline.groupby('monthday').quantile(0.9)
Upvotes: 2