Guerre
Guerre

Reputation: 45

Transforming a Dataframe for statsmodels t-test

I'm trying to run a t-test in pandas/statsmodels to compare differences in performance between two groups, but I'm having difficulty formatting the data in a way that statsmodels can use (in a reasonable way).

My pandas dataframe currently looks like this:

Treatment      Performance
a              2
b              3
a              2
a              1
b              0

And it's my understanding that to perform a t-test I need the data organized by treatment, like so:

TreatmentA    TreatmentB
2             3
2             0
1

This code almost does the trick:

cat1 = df.groupby('Treatment', as_index=False).groups['a']
cat2 = df.groupby('Treatment', as_index=False).groups['b']
print(ttest_ind(cat1, cat2))

But when I print, it looks like it's pulling the indices where that treatment occurred instead of the performance values:

print(cat1)
[0, 2, 4, 5, 9, 10, 11, 16, 18,...131, 133, 142, 147, 152, 153, 156, 157, 158]

It [maybe?] needs to be something more like this:

print(cat1)
[2, 2, 1, ...0, 3, 1, 1, 0, 2, 0, 0, 0]

What is the best way to convert this dataframe into a format that I can perform t-tests on?

Upvotes: 1

Views: 916

Answers (1)

lrnzcig
lrnzcig

Reputation: 3947

I think the simplest way is to do it like this:

ttest_ind(df[df['Treatment'] == 'a']['Performance'], df[df['Treatment'] == 'b']['Performance'])

Hope it helps.

Upvotes: 1

Related Questions