Reputation: 69
I have the dataset below with the number of car theft occurrences per day of the week in 2018 and I am looking to use the chi-square test to test the adherence of my data to the poisson distribution.
DAY_WEEK DATE NUMBER_OF_OCCURRENCES
0 Monday 2018-01-01 82
1 Monday 2018-01-08 162
2 Monday 2018-01-15 147
3 Monday 2018-01-22 133
4 Monday 2018-01-29 176
... ... ... ...
360 Sunday 2018-12-02 78
361 Sunday 2018-12-09 205
362 Sunday 2018-12-16 77
363 Sunday 2018-12-23 84
364 Sunday 2018-12-30 59
In my df each line is equivalent to the day of the week in the year. Thus, the first line is equivalent to the first Monday and the 52 line is equivalent to 52 Monday and so on.
Can anyone give me a light on how I test Poisson using chi-square in python? I've been on this issue for a few days and I haven't found a way to do that.
Thank you very much in advance!!!
Upvotes: 1
Views: 1117
Reputation: 77860
You need to summarize your data into categories: find a reasonable bin width, (e.g. 20 thefts), and count how many data points fall into each bin. Compare those against the expected values from a Poisson distribution with the same mean. This is the comparison necessary to perform the chi-squared test.
Note that, to keep this statistically sound, you must choose your bin width before you compare to the expected values. Pick something that gives you a decent quantity of values in the modal bin, and tails off at a convenient rate.
Also, cut off (on the right) the chi-squared test after one or two bins with 0 or 1 item, and an expected value comfortably below 1.
To handle the days of the week individually is qualitatively the same, but larger quantitatively. You have a separate series of bins for each day of the week. You can use the same bin width for all days, or adjust according to that day's traffic intensity.
For sake of illustration, let's assume that you find that 6 bins turn out to be convenient for each day. This will give you 42 categories (6 bins/day * 7 days) for your chi-squared test.
Upvotes: 2