Wilcoxon rank sum test between two data frames in python

Question

I am trying to perform a Wilcoxon rank-sum test between two data frames. I would like to perform the test only between the rows. for example, the test should only be done between row 1 in df1 (A, 1, 2, 3) and df2 (A ,10, 12 ,13), row 2 in df1 (B ,4, 5, 6) and df2 (B ,14, 15, 16), and so on.

df1=pd.DataFrame(np.array([['A',1, 2, 3], ['B',4, 5, 6], ['C',7, 8, 9]]),
                   columns=['Details','a', 'b', 'c'])

 
df2=pd.DataFrame(np.array([['A',10, 12, 13], ['B',14, 15, 16], ['C',17, 18, 19]]),
                   columns=['Details','a', 'b', 'c'])

This should lead me to a column of p values for the test between the rows of the data frames.

out = pd.DataFrame(np.array([['A',0.05], ['B',0.0002], ['C',1]]),
                   columns=['details','P'])

One way is to apply a for loop but unfortunately, I have 28000 rows in my original dataset and this experiment has to be repeated at least 1000 times. I am wondering if anyone has a better strategy to approach this. Thank you very much for your help in advance.

ga&#241;a&#241;ufla · Accepted Answer

One way to calculate this is using ranksums of scipy

from scipy.stats import ranksums
import pandas as pd


df1=pd.DataFrame(np.array([['A',1, 2, 3], ['B',4, 5, 6], ['C',7, 8, 9]]),
                   columns=['Details','a', 'b', 'c'])

 
df2=pd.DataFrame(np.array([['A',10, 12, 13], ['B',14, 15, 16], ['C',17, 18, 19]]),
                   columns=['Details','a', 'b', 'c'])


a = df1.loc[0,'a':].values.astype(int) #Select the first row
b = df2.loc[0,'a':].values.astype(int) #Select the second row

ranksums(a, b)

Wilcoxon rank sum test between two data frames in python

Answers (1)

Related Questions