Reputation: 35
I have a pandas dataframe that is the format:
Variable A1 A2 B1 B2 C1 C2 D1 D2
X 2 3 5 6 13 12 3 3
Y 1 1 7 9 16 19 11 9
Z 3 4 6 6 2 3 53 48
Where A1-A2, B1-B2, etc are replicate measurements and X, Y, Z are the different variables being measure.
I would like to do a t-test between D1-D2 and B1-B2 for each row and then append a new column with the p-values from each comparison.
Desired result would be:
Variable A1 A2 B1 B2 C1 C2 D1 D2 p-val
X 2 3 5 6 13 12 3 3 0.0345
Y 1 1 7 9 16 19 11 9 0.111
Z 3 4 6 6 2 3 53 48 0.0004
Thank you in advance.
Upvotes: 0
Views: 723
Reputation: 1048
I guess the method is right. But I am getting slightly different p-values
from scipy import stats
def my_func(x):
x["p-val"] = stats.ttest_ind([x.D1,x.D2], [x.B1,x.B2]).pvalue
return x
df = df.apply(my_func, axis=1)
Output:
Variable A1 A2 B1 B2 C1 C2 D1 D2 p-val
X 2 3 5 6 13 12 3 3 0.037749551350623724
Y 1 1 7 9 16 19 11 9 0.29289321881345254
Z 3 4 6 6 2 3 53 48 0.0031413032318505603
Upvotes: 0
Reputation: 51335
I've got different results (I can't guess how you're doing your T-Test), but you can use scipy.stats.ttest_ind
to do a t-test on independent variables, and extract the p-values from the result (the first index of the output, see linked doc for details):
from scipy.stats import ttest_ind
df['p-val'] = ttest_ind(df[['B1', 'B2']], df[['D1', 'D2']], axis=1)[1]
>>> df
Variable A1 A2 B1 B2 C1 C2 D1 D2 p-val
0 X 2 3 5 6 13 12 3 3 0.037750
1 Y 1 1 7 9 16 19 11 9 0.292893
2 Z 3 4 6 6 2 3 53 48 0.003141
Upvotes: 1