Reputation: 4623
I want to apply kruskal test for several columns. I do as bellow
import pandas as pd
import scipy
df = pd.DataFrame({'a':range(9), 'b':[1,2,3,1,2,3,1,2,3], 'group':['a', 'b', 'c']*3})
and then the Loop
groups = {}
res = []
for grp in df['group'].unique():
for column in df[[0, 1]]:
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
I get
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]
But i want
[KruskalResult(statistic=0.80000000000000071, pvalue=0.67032004603563911)]
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]
Where is my mistake?
for a single column i do as below
import pandas as pd
import scipy
df = pd.DataFrame({'numbers':range(9), 'group':['a', 'b', 'c']*3})
groups = {}
for grp in df['group'].unique():
groups[grp] = df['numbers'][df['group']==grp].values
print(groups)
args = groups.values()
scipy.stats.kruskal(*args)
Upvotes: 0
Views: 284
Reputation: 4623
before i made like this
groups = {}
res = []
for column in df[[0, 1]]:
for grp in df['group'].unique():
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
and i get
[KruskalResult(statistic=8.0000000000000036, pvalue=0.018315638888734137)]
The problem was in indent (((
Upvotes: 0
Reputation: 32085
Your for loops are upside down: the one-column algorithm is your loop invariant with regards to the column you chose. So the column for loop must be the outer loop. In plain English "for each column apply the kruskal algorithm which consists of this group.unique for loop:
groups = {}
res = []
for column in df[[0, 1]]:
for grp in df['group'].unique():
groups[grp] = df[column][df['group']==grp].values
args = groups.values()
g = scipy.stats.kruskal(*args)
res.append(g)
print (res)
Upvotes: 1