Having Trouble Visualizing a T-Distribution in Python

Question

I'm attempting to add a simple t-score visualization to some analysis utilities I'm writing (plotting scipy's pdf probability density function over an interval). In this example, I'm plotting a Student's t distribution, along with critical t-score cutoffs for a given problem set. However, the simple vis just isn't turning out correctly so far as I can tell.

In this example, I have an n=24 dataset and I'm trying to visualize an alpha=0.05 two tailed test for it (AKA statistical significance indicated by 2.5% in either tail of the distribution). I would expect the critical t-score to intersect the t-distribution at a y (probability) value of 0.025, but the t-distribution itself seems to be scaled/flattened? by some amount.

So far as I can tell, the t distribution just doesn't match up with what I would expect actual probabilities to be, but the setup is simple enough where I can't tell where I'm going wrong. I am somewhat new to statistics and am wondering if I'm missing something fundamental here?

## Basic T-Distribution
import scipy.stats as st
import matplotlib.pyplot as plt
import numpy as np

## Setup      
dof = 23        # Degrees of freedom
alpha = 0.05    # Significence level
ntails = 2      # Number of tails 

## Calculate critical t-score
tcrit = abs(st.t.ppf(alpha/ntails, dof))
# +=2.068

plt.figure()
xs = np.linspace(-10,10,1000)
plt.plot(xs, st.t.pdf(xs,dof), 'k', label="T-Distribution PDF")

## Plot some vertical lines representing critical t-score cutoff
critline = np.linspace(0,alpha/ntails)  # y range for critical line, AKA probability from 0-p*
xs_1 = len(critline) * [-tcrit]         # X ranges for plotting
xs_2 = len(critline) * [tcrit]
plt.plot(xs_1, critline, 'r', label="-t* for dof=23")
plt.plot(xs_2, critline,'r', label="t* for dof=23")
plt.legend()

Robert Kern · Accepted Answer

The PDF is a density. The Y axis is not in units of "probability" but "probability per unit of X". Evaluate the PDF at tcrit to get the appropriate value to match the curve.

Try this to plot the vertical lines:

plt.vlines([-tcrit, tcrit], 0.0, st.t.pdf(tcrit), colors='r')

Having Trouble Visualizing a T-Distribution in Python

Answers (1)

Related Questions