Reputation: 1
I am attempting to threshold a pandas dataframe which contains gene id's and statistical information. The input to my python program is a config.yaml file that holds the initial threshold values and a path to a CSV file (the eventual dataframe). The problem that I seem to be running into stems from the passing of my threshold variables into a "cut-down" dataframe. I am able to successfully threshold when using the integer values (in a deprecated method), but I receive an empty dataframe when trying to threshold using variables pointing to values in the config file.
Below is my current implementation:
config = yaml.full_load(file)
# for item, doc in config.items():
# print (item, ":", doc)
input_path = config['DESeq_input']['path']
# print(input_path)
baseMean = config['baseMean']
log2FoldChange = config['log2FoldChange']
lfcSE = config['lfcSE']
pvalue = config['pvalue']
padj = config['padj']
df = pd.read_csv(input_path)
# print if 0 < than padj for test
# convert to #, most likely being read as string
# now use threshold value to cut down CSV
# only columns defined in config.yaml file
df_select = df[['genes', 'baseMean', 'log2FoldChange', 'lfcSE', 'pvalue', 'padj']]
# print(df_select)
# print(df_select['genes'])
df_threshold = df_select.loc[(df_select['baseMean'] < baseMean)
& (df_select['log2FoldChange'] < log2FoldChange)
& (df_select['lfcSE'] < lfcSE)
& (df_select['pvalue'] < pvalue)
& (df_select['padj'] < padj)]
print(df_threshold)
And below is my (deprecated) implementation (that works):
df = pd.read_csv('/Users/nmaki/Documents/GitHub/IDEA/tests/eDESeq2.csv')
df_select = df[['genes', 'pvalue', 'padj', 'log2FoldChange']]
df_threshold = df_select.loc[(df_select['pvalue'] < 0.05)
& (df_select['padj'] < 0.1)
& (df_select['log2FoldChange'] < 0.5)]
print(df_threshold)
Upon execution of my current code get:
Empty DataFrame
Columns: [genes, baseMean, log2FoldChange, lfcSE, pvalue, padj]
Index: []
Example contents of the csv file I am loading in as a dataframe:
"genes","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj"
"ENSDARG00000000001",98.1095154977918,-0.134947665995593,0.306793322887575,-0.439865068527078,0.660034837008121,0.93904992415549
"ENSDARG00000000002",731.125841719954,0.666095249996351,0.161764851506172,4.11767602043598,3.82712199388831e-05,0.00235539468663284
"ENSDARG00000000018",367.699187187462,-0.170546910862128,0.147128047078344,-1.1591733476304,0.246385533026112,0.756573630543937
"ENSDARG00000000019",1133.08821430092,-0.131148919306121,0.104742185100469,-1.25211173683576,0.210529151546469,0.718240791187956
"ENSDARG00000000068",397.13408030651,-0.111332941901299,0.161417383863387,-0.689720891496564,0.49036972534723,0.8864754582597
"ENSDARG00000000069",1886.21783387126,-0.107901197025113,0.113522109960702,-0.950486183374019,0.341865271089735,0.82295928359482
"ENSDARG00000000086",246.197553048504,0.390421091410488,0.215725761369183,1.80980282063921,0.0703263703690051,0.466064880589034
"ENSDARG00000000103",797.782152145232,0.236382332789599,0.145111727277908,1.62896781138092,0.103319833277229,0.550658656731341
"ENSDARG00000000142",26.1411622212853,0.248419645848534,0.495298350652519,0.501555568519983,0.615980180267141,0.927327861190167
"ENSDARG00000000151",121.397701922367,0.276123125224845,0.244276041791451,1.13037333993066,0.25831894300396,0.766841249972654
"ENSDARG00000000161",22.2863001989718,0.837640942615127,0.542200061816621,1.54489274643135,0.122372208261173,0.587106227452529
"ENSDARG00000000183",215.47910609869,0.567221763062732,0.188807351259458,3.00423558340829,0.00266249076445763,0.0615311290935424
"ENSDARG00000000189",620.819069705942,0.0525797819665496,0.142171888686286,0.369832478504743,0.711507313969775,0.950479626809728
"ENSDARG00000000212",54472.1417532637,0.344813324409911,0.130070467015575,2.65097321722249,0.00802602056136946,0.132041563800088
"ENSDARG00000000229",172.985864037855,-0.0814838221355631,0.22200915791162,-0.367029103222856,0.713597309421024,0.95157821096128
"ENSDARG00000000241",511.449190233542,-0.431854805500191,0.157764756166574,-2.73733383801019,0.0061939401710654,0.114238610824236
"ENSDARG00000000324",179.189751392247,0.0141623609187069,0.206197755704643,0.0686833902256096,0.945241639658214,0.992706066946251
"ENSDARG00000000349",13.6578995386995,0.86981405362392,0.716688718472183,1.21365668414338,0.224878851627296,0.731932542953245
"ENSDARG00000000369",9.43959070533812,-0.042383076946964,0.868977019485631,-0.0487735302506061,0.961099776861288,NA
"ENSDARG00000000370",129.006520833067,0.619490133053518,0.250960632807829,2.46847533863165,0.0135690001510168,0.184768676917612
"ENSDARG00000000380",17.695581482726,-0.638493654324115,0.597289695632778,-1.06898488119351,0.285076482019819,0.786103920659844
"ENSDARG00000000394",2200.41651475378,-0.00605761754099435,0.0915611724486909,-0.0661592395443486,0.947251047773153,0.992978480118812
"ENSDARG00000000423",195.477813443242,-0.18634265895713,0.188820984694016,-0.986874733542448,0.323704052061987,0.810439992736898
"ENSDARG00000000442",1102.47980192551,0.0589654622770368,0.112333519273845,0.524914225586502,0.599642819781172,0.920807266898811
"ENSDARG00000000460",8.52822266110357,0.229130838495461,0.957763036484278,0.239235416034165,0.810923041830713,NA
"ENSDARG00000000472",0.840917787550721,-0.4234502342491,3.1634759582284,-0.133855998857105,0.893516444899853,NA
"ENSDARG00000000474",5.12612778660879,0.394871266508097,1.07671345623418,0.366737560696199,0.713814786364707,NA
"ENSDARG00000000476",75.8417047936895,0.242006157627571,0.349451220882324,0.692532013528336,0.488603288756242,0.885874315527816
"ENSDARG00000000489",1233.33364888202,0.0676458807753533,0.131846296650645,0.513066217965876,0.607905001380741,0.924392802283811
Upvotes: 0
Views: 109
Reputation: 1
As it turns out, my thresholds were too restrictive (I had added 2 additional variables that did not exist in my original implementation). I am receiving a populated dataframe now.
Upvotes: 0