Reputation: 2080
I have a couple of files of the same format which I need to filter based on certain threshold based on three columns from those data frames.And in the end I need to save them as separate results
The example dataframe looks like follows,
ID Mean log2FoldChange SE stat pvalue padj
0 ENSG2 0.737466 -0.434579 0.484389 -0.897170 0.369628 0.607709
1 ENSG32 321.467787 -0.405760 0.170955 -2.373484 0.017621 0.097636
2 ENSG85 0.000000 NaN NaN NaN NaN NaN
And when I try to run the following function which I defined to use to filter and extract a subset from the dataframe and save it
def DEfilter(df):
Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
#Frames = [Up_regulted,Down_regulated]
DE = pd.concat(Up_regulted,Down_regulated)
return df
and when I try to apply it on one of the dataframes,
Patient_pairs.apply(DEfilter,axis=1)
Its throwing me following error,
AttributeError: ("'Series' object has no attribute 'query'", 'occurred at index 0')
This is so far what I tried to get the filtered results saved as new file,
path = '/home/pathtofile'
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]
for filename in results:
name = os.path.basename(os.path.normpath(filename))
df = pd.read_csv(filename, sep=sep, header=0)
Up = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
DE = pd.concat(Up,Down)
DE.to_csv('Filtered_set_' + name, sep='\t',index=False)
Any help/suggestions would be great
Upvotes: 0
Views: 202
Reputation: 107707
You are attempting to run a data frame level operation on series level method. Do not pass the function in DataFrame.apply (which applies a function on either the rows or columns of a dataframe). Simply call the function as is and pass the whole data frame as a parameter:
path = '/home/pathtofile'
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]
def DEfilter(df):
Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
DE = pd.concat([Up_regulted, Down_regulated])
return DE
for filename in results:
df = pd.read_csv(filename, sep=sep, header=0)
DE = DEfilter(df)
name = os.path.basename(os.path.normpath(filename))
DE.to_csv('Filtered_set_' + name, sep='\t',index=False)
Upvotes: 2