Reputation: 18953
My Problem I'm Trying To Solve
I have 11 months worth of performance data:
Month Branded Non-Branded Shopping Grand Total
0 2/1/2015 1330 334 161 1825
1 3/1/2015 1344 293 197 1834
2 4/1/2015 899 181 190 1270
3 5/1/2015 939 208 154 1301
4 6/1/2015 1119 238 179 1536
5 7/1/2015 859 238 170 1267
6 8/1/2015 996 340 183 1519
7 9/1/2015 1138 381 172 1691
8 10/1/2015 1093 395 176 1664
9 11/1/2015 1491 426 199 2116
10 12/1/2015 1539 530 156 2225
Let's say it's February, 1 2016 and I'm asking "are the results in January statistically different from the past 11 months?"
Month Branded Non-Branded Shopping Grand Total
11 1/1/2016 1064 408 106 1578
I came across a blog...
I came across iaingallagher's blog. I will reproduce here (in case the blog goes down).
1-sample t-test
The 1-sample t-test is used when we want to compare a sample mean to a population mean (which we already know). The average British man is 175.3 cm tall. A survey recorded the heights of 10 UK men and we want to know whether the mean of the sample is different from the population mean.
# 1-sample t-test
from scipy import stats
one_sample_data = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]
one_sample = stats.ttest_1samp(one_sample_data, 175.3)
print "The t-statistic is %.3f and the p-value is %.3f." % one_sample
Result:
The t-statistic is 2.296 and the p-value is 0.047.
Finally, to my question...
In iaingallagher's example, he knows the population mean and is comparing a sample (one_sample_data
). In MY example, I want to see if 1/1/2016
is statistically different from the previous 11 months. So in my case, the previous 11 months is an array (instead of a single population mean value) and my sample is one data point (instead of an array)... so it's kind of backwards.
QUESTION
If I was focused on the Shopping
column data:
Will scipy.stats.
ttest_1samp([161,197,190,154,179,170,183,172,176,199,156], 106)
produce a valid result even though my sample (first parameters) is a list of previous results and I'm comparing it to a popmean
that's not the population mean but instead one sample.
If this is not the correct stats function, any recommendation on what to use for this hypothesis test situation?
Upvotes: 2
Views: 13519
Reputation: 101
If you are only interested in the "Shopping"
column, try to create a .xlsx or .csv file containing the data from only the "Shopping"
column.
This way you could import this data and make use of pandas to perform the same T-test for each column individually.
import pandas as pd
from scipy import stats
data = pd.read_excel("datafile.xlxs")
one_sample_data = data["Shopping"]
one_sample = stats.ttest_1samp(one_sample_data, 175.3)
Upvotes: 0