Simd
Simd

Reputation: 21343

Test for Poisson process

I would like to run some tests for the null hypothesis that the times of events I have was created from a homogeneous Poisson process (see e.g. http://en.wikipedia.org/wiki/Poisson_process ). For a fixed number of events the times should therefore look like a sorted version of a uniform distribution in the appropriate range. There is an implementation of the Kolmogorov-Smirnov test at http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.stats.kstest.html but I can't see how to use it here as scipy.stats doesn't seem to know about Poisson processes.

As a simple example, this sample data should give a high p-value for any such test.

import random
nopoints = 100
max = 1000

points = sorted([random.randint(0,max) for j in xrange(nopoints)])

How can I make a sensible test for this problem?

From www.stat.wmich.edu/wang/667/classnotes/pp/pp.pdf‎ I see

" REMARK 6.3 ( TESTING POISSON ) The above theorem may also be used to test the hypothesis that a given counting process is a Poisson process. This may be done by observing the process for a fixed time t. If in this time period we observed n occurrences and if the process is Poisson, then the unordered occurrence times would be independently and uniformly distributed on (0, t]. Hence, we may test if the process is Poisson by testing the hypothesis that the n occurrence times come from a uniform (0, t] population. This may be done by standard statistical procedures such as the Kolmogorov-Smirov test."

Upvotes: 6

Views: 7088

Answers (3)

Josef
Josef

Reputation: 22897

Warning: quickly written, some details are not verified

what's the appropriate estimator for exponential, degrees of freedom for chisquare test

based on lecture notes

The implications of homogeneity is not rejected with any of the three tests. Illustrates how to use kstest and chisquare test of scipy.stats

# -*- coding: utf-8 -*-
"""Tests for homogeneity of Poissson Process

Created on Tue Sep 17 13:50:25 2013

Author: Josef Perktold
"""

import numpy as np
from scipy import stats

# create an example dataset
nobs = 100
times_ia = stats.expon.rvs(size=nobs) # inter-arrival times
times_a = np.cumsum(times_ia) # arrival times
t_total = times_a.max()

# not used
#times_as = np.sorted(times_a)
#times_ia = np.diff(times_as)

bin_limits = np.array([ 0. ,  0.5,  1. ,  1.5,  2. ,  np.inf])
nfreq_ia, bins_ia = np.histogram(times_ia, bin_limits)


# implication: arrival times are uniform for fixed interval
# using times.max() means we don't really have a fixed interval
print stats.kstest(times_a, stats.uniform(0, t_total).cdf)

# implication: inter-arrival times are exponential
lambd = nobs * 1. / t_total
scale = 1. / lambd

expected_ia = np.diff(stats.expon.cdf(bin_limits, scale=scale)) * nobs
print stats.chisquare(nfreq_ia, expected_ia, ddof=1)

# implication: given total number of events, distribution of times is uniform
# binned version
n_mean_bin = 10
n_bins_a = nobs // 10
bin_limits_a = np.linspace(0, t_total+1e-7, n_bins_a + 1)
nfreq_a, bin_limits_a = np.histogram(times_a, bin_limits_a)
# expect uniform distributed over every subinterval
expected_a = np.ones(n_bins_a) / n_bins_a * nobs
print stats.chisquare(nfreq_a, expected_a, ddof=1)

Upvotes: 4

Hooked
Hooked

Reputation: 88208

The KS test, when determining if two distributions differ, is simply the largest difference between them:

enter image description here

This is simple enough to calculate yourself. The program below computes the KS statistic for two Poisson processes with different parameter sets:

import numpy as np

N = 10**6
X  = np.random.poisson(10, size=N)
X2 = np.random.poisson(7, size=N)

bins = np.arange(0, 30,1)
H1,_ = np.histogram(X , bins=bins, normed=True)
H2,_ = np.histogram(X2, bins=bins, normed=True)

D = np.abs(H1-H2)

idx = np.argmax(D)
KS = D[idx]

# Plot the results
import pylab as plt
plt.plot(H1, lw=2,label="$F_1$")
plt.plot(H2, lw=2,label="$F_2$")
text = r"KS statistic, $\sup_x |F_1(x) - F_2(x)| = {KS:.4f}$"
plt.plot(D, '--k', label=text.format(KS=KS),alpha=.8)
plt.scatter([bins[idx],],[D[idx],],s=200,lw=0,alpha=.8,color='k')
plt.axis('tight')
plt.legend()

enter image description here

Upvotes: 2

Ofir Israel
Ofir Israel

Reputation: 3913

The problem is, as the doc you linked to suggests: "The KS test is only valid for continuous distributions." while a poisson distribution is discrete.

I would suggest to you maybe use the example in this link: http://nbviewer.ipython.org/urls/raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/Chapter1_Introduction/Chapter1_Introduction.ipynb (look for "##### Example: Inferring behaviour from text-message data")

In that link, they check for the appropriate lambda(s) for a particular dataset which they assume distributes according to a poisson process.

Upvotes: 0

Related Questions