Reputation: 153
I am working on a large gene array where a single gene is represented by multiple probes, any where from 2 to 20. I basically want to do a kruskal-wallace test (non-parametric anova) to determine if the probes on a single gene give essentially the same info. SO I have written the following code, let genes be a (1 x m) array where m is the number of probes. Let X be a (n x m) numpy array).
ugenes=np.unique(genes)
for i in range(genes.size):
gene=ugenes[i]
idx=np.where(genes==gene)[0]
% the length of idx is variable and can be anywhere from 2 to twenty so Xd is of
% variable size
Xd=X[:,idx]
[h,p]=kruskal(Xd)
the problem I run into is that kruskal only takes one dimensional arrays with multiple arrays as arguments. Since Xd is variable I do not how many single arrays I would have to break it up into and the number of unique genes is on the order of 20000 so doing it manually is not an option. Is there anyway to break up Xd such that it can be sent into to kruskal as
kruskal(Xd[:,0],Xd[:,1],...Xd[:,z])
(where z is the number of total columns of Xd) on the fly without doing the number of columns of Xd?
Upvotes: 1
Views: 54
Reputation: 12407
You can try converting columns to list and passing it to kruskal:
Xd=list(X[:,idx].T)
[h,p]=kruskal(*Xd)
Upvotes: 1