Reputation: 83
I am trying to compute p-values for the slope obtained using xarray.polyfit() by extracting the covariance matrix and using a t-test. However, I am getting 1s for all p-values, which seems incorrect. The same datasets give valid p-values when using scipy.stats.linregress() for specific points but slow to apply to the whole dataset.
import numpy as np
import xarray as xr
from scipy import stats
# Generate synthetic data
np.random.seed(42)
lat, lon = np.arange(-90, 91, 10), np.arange(-180, 181, 20) # Coarse grid
time = np.arange(2000, 2021) # Yearly data from 2000 to 2020
# Create a random dataset with a slight trend
trend = 0.05 # Define a small increasing trend
noise_level = 0.1
data_values = trend * (time - 2000)[
None, None, :
] + noise_level * np.random.randn( # Linear trend component
len(lat), len(lon), len(time)
) # Random noise
# Create an xarray DataArray
data = xr.DataArray(
data_values,
coords={"lat": lat, "lon": lon, "time": time},
dims=["lat", "lon", "time"],
name="t",
)
ds = xr.Dataset({"t": data})
ds
def compute_trends(data: xr.Dataset):
trends = xr.Dataset()
# Sort by time to ensure correct order
data = data.sortby("time")
n = data.dims["time"] # Number of time points
ds = data.polyfit(dim="time", deg=1, cov=True)
slope = ds["t_polyfit_coefficients"].sel(degree=1)
slope_variance = ds["t_polyfit_covariance"].sel(
cov_i=1, cov_j=1
) # Variance of slope , i have tried with 0,0 but although the results are close to expected , they are not accurate so , i must be doing something wrong here.
stderr = np.sqrt(slope_variance)
t_stat = slope / stderr
p_values = 2 * (1 - stats.t.cdf(np.abs(t_stat), df=n - 2))
trends["slope"] = slope
trends["p_value"] = xr.DataArray(p_values, coords=slope.coords, dims=slope.dims)
return trends
# Compute trends
trends = compute_trends(ds)
print(trends.p_value)
I suspect an issue with how I’m extracting the covariance matrix (polyfit_covariance) or computing stderr. Am I correctly extracting the slope variance from polyfit_covariance?
Upvotes: -1
Views: 52