Reputation: 15793
Given a pandas
DataFrame
with columns for group
, x
, and y
(multiple records per group
value), I'd like to create a new DataFrame
with one row per group
and the associated t-statistic for x
and y
values in that group. I'd like to do this with groupby
, not a loop.
Example:
import pandas as pd
import numpy as np
from scipy import stats
N = 100 # Observations per group.
tt_df = pd.DataFrame({'group': np.append(['A'] * N, ['B'] * N),
'x': np.random.randn(2 * N)})
tt_df['y'] = tt_df['x'] + np.random.randn(2 * N)
stats.ttest_ind(tt_df['x'], tt_df['y'])[0] # -0.32 global t statistic.
Upvotes: 6
Views: 8183
Reputation: 15793
tt_df.groupby('group').apply(lambda df: stats.ttest_ind(df['x'], df['y'])[0])
# group
# A -0.292413
# B -0.167816
# dtype: float64
Upvotes: 6