Oneway Anova of individual rows of Dataframe

Question

I am currently trying to run one way ANOVA for each row of my data frame (there are 519 rows, each row represents a different biological taxa and each column a different sample); however I continually get an invalid syntax error, and I believe my error lies in my selection of the rows. I am fairly new to python and pandas, so here's what I have so far, with Subj1 being the name of my dataframe:

for x in range(0,24):
    print(scipy.stats.f_oneway(Subj1.iloc[[x,:],:]))

How would I go about iterating through the rows so that I return the anova values for each row?

Thanks in Advance!

Edit: I tried to convert the data frame to values then run the iteration like so to no avail :( :

Subject1Values=Subj1.values
for x in range(0,24):
    print(scipy.stats.f_oneway(Subj1Values[x]))

Edit 2: I tried this but it still is returning (nan,nan) multiple times:

Subj1Values=Subj1.values
for i in range(0,24):
    print(stats.f_oneway(Subj1Values[[i],[0]],Subj1Values[[i],[1]],Subj1Values[[i],[2]],Subj1Values[[i],[3]],Subj1Values[[i],[4]],Subj1Values[[i],[5]]))

unutbu · Accepted Answer

itertools.product can generate the cartesian product of two sequences of items. For example,

In [4]: import itertools as IT

In [5]: list(IT.product([1,2,3], [4,5,6]))
Out[5]: [(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

Therefore, to generate all pairs of rows and columns, you could use

import itertools as IT
import scipy.stats as stats

arr = Subj1.values
rows = range(arr.shape[0])
columns = range(arr.shape[1])
for i,j in IT.product(rows, columns):
    print(stats.f_oneway(arr[i,:], arr[:,j]))

Note that it seems as though your data is more like an array than a DataFrame. DataFrames have an index on the rows and column names for the columns. You are not using either of these here which suggests perhaps you don't need to be using a DataFrame. Moreover, the values in the rows and columns are being treated as qualitatively the same things. That's usually not true of data in a DataFrame. So you may be better off making Subj1 a NumPy array rather than a Pandas DataFrame.

Oneway Anova of individual rows of Dataframe

Answers (1)

Related Questions