Reputation: 1292
I want to calculate and test the mean of two different groups of multiple columns in pandas, I can work the calculate part out, but no good solution so far for the test part. Below are a toy sample and the result I want.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=['col_1','col_2'])
df['group'] = ['A']*50 + ['B']*50
df.groupby('group').agg({"col_1":"mean","col_2":"mean"})
col_1 col_2
group
A 52.26 56.58
B 53.04 49.18
What I want to have:
col_1 t_col_1 col_2 t_col_2
group
A 52.26 4.3*** 56.58 0.8
B 53.04 4.3*** 49.18 0.8
In which t_col_1
is t statistics of the difference of means of col_1
in group A and group B, i.e. t.test(df.loc[df['group'].isin(['B'])][col_1], df.loc[df['group'].isin(['A'])][col_1])
. The stars are not necessary but wouldb be great if they can be there.
Any suggestions on how to do this?
Upvotes: 3
Views: 3749
Reputation: 10580
You can iterate over the columns and perform t tests by your groups:
import pandas as pd
import scipy.stats as stats
tstats = {}
ix_a = df['group'] == 'A'
for x in df:
if x != 'group':
tstats['t_' + x] = stats.ttest_ind(df[x][ix_a], df[x][~ix_a])[0]
df.groupby('group').mean().assign(**tstats)
Result:
col_1 col_2 t_col_1 t_col_2
group
A 56.24 46.84 0.85443 -0.281279
B 51.24 48.42 0.85443 -0.281279
Upvotes: 2