BioProgram
BioProgram

Reputation: 704

How to calculate and add an FDR column to file in python

I have the following input file (Input.xls):

Mouse   No_neigh_mouse  Human   No_neigh_hum    Intersection    TotalGeneTested
Gm20645 1   lnc3    2   1   8
Gm20645 1   lnc2    1   0   8
Gm20645 1   lnc1    2   1   8
Gm26549 2   lnc3    2   1   8
Gm26549 2   lnc2    1   1   8
Gm26549 2   lnc1    2   1   8

I would like to:

  1. Calculate the hypergeom p-value for each row (done successfully)
  2. Then calculate the fdr (which is same as BH) for p-value corrections
  3. Add the adjusted p-value as the last column.

My expected output file will have four columns. First is the value of "Mouse", second is the value of "Human", third is "Hypergeom-pvalue", fourth is the "Adjusted-pvalue". I am able to generate the first 3 columns using the following code:

output=open("Hypergeom.xls", "w")
output.write("Mouse\tHuman\tHypergeom-pvalue\tAdjusted-pvalue\n")
Input = pd.read_table("Input.xls", sep="\t")

for i in range (0, len(Input.index)):
    hyperg= scipy.stats.hypergeom.sf(Input.ix[i,4], Input.ix[i,5], Input.ix[i,1], Input.ix[i,3],1) #calculates hypergeom p value without a problem
    newline = Input.ix[i,0], Input.ix[i,2], str(hyper)
    output.write('\t'.join(newline)+'\n')
    output.close()

Till here, the scripts works fine and I get the following output file ("Hypergeom.xls"):

Mouse   Human   Hypergeom-pvalue    Adjusted-pvalue
Gm20645 lnc3    0.25
Gm20645 lnc2    1
Gm20645 lnc1    0.25
Gm26549 lnc3    0.464285714
Gm26549 lnc2    0.25
Gm26549 lnc1    0.464285714

I then aimed at reopening the output file as an input and then calculate the fdr based on a command suggested by one of the users which utilizes R: How to implement R's p.adjust in Python

My code then:

import rpy2.robjects as R
pvaluefile = pd.read_table("Hypergeom.xls", sep="\t")
pvalue_list = pvaluefile.ix[:,2].tolist()  #converts the value column series to a list
#Now, i try to apply the command from the SO link above
p_adjusted = R['p.adjust'](R.FloatVector(pvalue_list),method='BH')
for v in p_adjusted:
    print v

I get an error at the step p_adjusted = R [...]. Error is: TypeError: 'module' object has no attribute 'getitem'

Hence, I have two problems:

  1. I cant figure out how to calculate the fdr by overcoming this error
  2. How can i add the fdr column at the end of the file as a fourth column?

Upvotes: 1

Views: 1928

Answers (0)

Related Questions