Reputation: 2153
Inspiration
In R, this is very easy
data("iris")
bartlett.test(Sepal.Length ~ Species,data = iris)
The important thing about the data set is that the column Sepal.Length is numerical, the species is categorical.
Problem
In Python scipy.stats.bartlett
would need separate arrays for each species, see docs.
What would be the easiest way to achieve this?
An easy way to get the dataset in python:
from sklearn import datasets
iris = datasets.load_iris()
iris = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= ["sepal.length","sepal.width","petal.length","petal.width"] + ['species'])
I really wanted this to work:
iris.groupby("species")["sepal.length"].apply(ss.bartlett)
but it didn't due to it needing multiple sample vectors.
Upvotes: 1
Views: 323
Reputation: 2939
Following the groupby pattern you can do a bit of manipulation and do this:
gb = iris.groupby('species')["sepal.length"]
ss.bartlett(*[gb.get_group(x).values for x in gb.groups])
the *
unpacks the list into the function, the rest is just to get the groups into the right form for the function to take. As mentioned in the comments, the .values
isn't needed here so we can write it as:
gb = iris.groupby('species')["sepal.length"]
ss.bartlett(*[gb.get_group(x) for x in gb.groups])
And just for completion, if you really want to do it in one line:
ss.bartlett(*[x[1] for x in iris.groupby('species')["sepal.length"]])
But I personally find that less readable.
Upvotes: 4