Mish
Mish

Reputation: 51

Code to do ANOVA on different Dataframes where groups can change

I have the following Dataframe. However it could be any data frame in that format.

df = pd.DataFrame({'Weight': [4.17,5.58,5.18,6.11,4.5,4.61,5.17,4.53,5.33,5.14,4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69,6.31,5.12,5.54,5.5,5.37,5.29,4.92,6.15,5.8,5.26],
                   'Group': ['A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C'] 
                   })

How can I perform an ANOVA to give me the F and p values for this without explicitly specifying the groups by text? In other words, is there a code that will automatically detect the groups and run an ANOVA so that it will work on any dataframe in that structure, not just this one?

Upvotes: 0

Views: 1566

Answers (1)

Erfan
Erfan

Reputation: 42886

To test the similarity of the means between each combinations of groups, we can use itertools.combinations and scipy.stats.f_oneway:

The null hypothesis here is, as quoted from the documentation:

that two or more groups have the same population mean


Scenario 1: comparing all groups:

from scipy.stats import f_oneway

grps = [d['Weight'] for _, d in df.groupby('Group')]

F, p = f_oneway(*grps)

print(F, p)
4.846087862380136 0.0159099583256229

Scenario 2: comparing each combination of columns:

from itertools import combinations
from scipy.stats import f_oneway

combs = list(combinations(df['Group'].unique(), 2))
for g1, g2 in combs:
    a = f_oneway(df.loc[df['Group'] == g1, 'Weight'], 
                 df.loc[df['Group'] == g2, 'Weight'])
    print(f'For groups {g1} & {g2} the F-value is: {a[0]}, the p-value is: {a[1]}')

Output

For groups A & B the F-value is: 1.4191012973623165, the p-value is: 0.24902316597300575
For groups A & C the F-value is: 4.554043294351827, the p-value is: 0.04685138491157386
For groups B & C the F-value is: 9.0606932332992, the p-value is: 0.007518426118219876

Upvotes: 1

Related Questions