Jarad
Jarad

Reputation: 18953

Scipy Stats ttest_1samp Hypothesis Testing For Comparing Previous Performance To Sample

My Problem I'm Trying To Solve

I have 11 months worth of performance data:

        Month  Branded  Non-Branded  Shopping  Grand Total
0    2/1/2015     1330          334       161         1825
1    3/1/2015     1344          293       197         1834
2    4/1/2015      899          181       190         1270
3    5/1/2015      939          208       154         1301
4    6/1/2015     1119          238       179         1536
5    7/1/2015      859          238       170         1267
6    8/1/2015      996          340       183         1519
7    9/1/2015     1138          381       172         1691
8   10/1/2015     1093          395       176         1664
9   11/1/2015     1491          426       199         2116
10  12/1/2015     1539          530       156         2225

Let's say it's February, 1 2016 and I'm asking "are the results in January statistically different from the past 11 months?"

       Month  Branded  Non-Branded  Shopping  Grand Total
11  1/1/2016     1064          408       106         1578

I came across a blog...

I came across iaingallagher's blog. I will reproduce here (in case the blog goes down).

1-sample t-test

The 1-sample t-test is used when we want to compare a sample mean to a population mean (which we already know). The average British man is 175.3 cm tall. A survey recorded the heights of 10 UK men and we want to know whether the mean of the sample is different from the population mean.

# 1-sample t-test
from scipy import stats
one_sample_data = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]

one_sample = stats.ttest_1samp(one_sample_data, 175.3)

print "The t-statistic is %.3f and the p-value is %.3f." % one_sample

Result:

The t-statistic is 2.296 and the p-value is 0.047.

Finally, to my question...

In iaingallagher's example, he knows the population mean and is comparing a sample (one_sample_data). In MY example, I want to see if 1/1/2016 is statistically different from the previous 11 months. So in my case, the previous 11 months is an array (instead of a single population mean value) and my sample is one data point (instead of an array)... so it's kind of backwards.

QUESTION

If I was focused on the Shopping column data:

Will scipy.stats.ttest_1samp([161,197,190,154,179,170,183,172,176,199,156], 106) produce a valid result even though my sample (first parameters) is a list of previous results and I'm comparing it to a popmean that's not the population mean but instead one sample.

If this is not the correct stats function, any recommendation on what to use for this hypothesis test situation?

Upvotes: 2

Views: 13519

Answers (1)

Ammanuel
Ammanuel

Reputation: 101

If you are only interested in the "Shopping" column, try to create a .xlsx or .csv file containing the data from only the "Shopping"column.

This way you could import this data and make use of pandas to perform the same T-test for each column individually.

import pandas as pd
from scipy import stats
data = pd.read_excel("datafile.xlxs")
    one_sample_data = data["Shopping"]

    one_sample = stats.ttest_1samp(one_sample_data, 175.3)

Upvotes: 0

Related Questions