Reputation: 7681
I have some experimental data. The experiment measured 126 genes over time in three different cell lines with an n=6
. The normalized measurement is known as the delta_ct
value. The data is stored in a pandas.DataFrame
which looks like this:
Gene Group Time Repeat delta_ct
Group Time Repeat
Adult 0 1 SMAD3 Adult 0 1 0.115350
2 SMAD3 Adult 0 2 0.076046
3 SMAD3 Adult 0 3 0.081212
4 SMAD3 Adult 0 4 0.083205
5 SMAD3 Adult 0 5 0.101456
6 SMAD3 Adult 0 6 0.089714
1 1 SMAD3 Adult 1 1 0.088079
2 SMAD3 Adult 1 2 0.093965
3 SMAD3 Adult 1 3 0.114951
4 SMAD3 Adult 1 4 0.082359
5 SMAD3 Adult 1 5 0.080788
6 SMAD3 Adult 1 6 0.103181
Neonatal 24 1 SMAD3 Neonatal 24 1 0.039883
2 SMAD3 Neonatal 24 2 0.037161
3 SMAD3 Neonatal 24 3 0.042874
4 SMAD3 Neonatal 24 4 0.047950
5 SMAD3 Neonatal 24 5 0.053673
6 SMAD3 Neonatal 24 6 0.040181
30 1 SMAD3 Neonatal 30 1 0.035015
2 SMAD3 Neonatal 30 2 0.042596
3 SMAD3 Neonatal 30 3 0.038034
4 SMAD3 Neonatal 30 4 0.040363
5 SMAD3 Neonatal 30 5 0.034818
6 SMAD3 Neonatal 30 6 0.031685
Note I kept the columns which created the index as columns because it makes plotting with seaborn
a bit easier. My question is, how would I perform a t-test to test the hypothesis that the means for each time point between the different cell lines are significantly different from each other.
For example, in the data above, I want to perform a t-test on df.loc[['Adult',0]]
and df.loc[['Neonatal',0]]
, i.e. the same time point but different cell lines.
Upvotes: 3
Views: 4668
Reputation: 294228
Use the Welch t-test which you can access via scipy
s ttest_ind
from scipy.stats import ttest_ind
ttest_ind(df.loc[['Adult', 0]].delta_ct, df.loc[['Neonatal', 0]])
Or if you'd prefer, you can write your own function.
def welch_ttest(x1, x2):
x_1 = x1.mean()
x_2 = x2.mean()
s1 = x1.std()
s2 = x2.std()
n1 = len(x1)
n2 = len(x2)
return ((x_1 - x_2) / (np.sqrt(s1 ** 2 / n1 + s2 ** 2 / n2)))
welch_ttest(df.loc[['Adult', 0]].delta_ct, df.loc[['Neonatal', 0]])
Upvotes: 3