ARJ
ARJ

Reputation: 2080

Attribute error while filtering a dataframe based values from certain columns

I have a couple of files of the same format which I need to filter based on certain threshold based on three columns from those data frames.And in the end I need to save them as separate results

The example dataframe looks like follows,

    ID  Mean    log2FoldChange  SE  stat    pvalue  padj
0   ENSG2   0.737466    -0.434579   0.484389    -0.897170   0.369628    0.607709
1   ENSG32  321.467787  -0.405760   0.170955    -2.373484   0.017621    0.097636
2   ENSG85  0.000000    NaN NaN NaN NaN NaN

And when I try to run the following function which I defined to use to filter and extract a subset from the dataframe and save it

def DEfilter(df):
    Up_regulted    = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
    Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
    #Frames         = [Up_regulted,Down_regulated]
    DE             = pd.concat(Up_regulted,Down_regulated)
    return df

and when I try to apply it on one of the dataframes,

Patient_pairs.apply(DEfilter,axis=1)

Its throwing me following error,

 AttributeError: ("'Series' object has no attribute 'query'", 'occurred at index 0')

This is so far what I tried to get the filtered results saved as new file,

     path       = '/home/pathtofile' 
        files      = os.listdir(path)

        results        = [os.path.join(path,i) for i in files if i.startswith('DE')]

    for filename in results:
        name       = os.path.basename(os.path.normpath(filename))
        df         = pd.read_csv(filename, sep=sep, header=0)
        Up         = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
        Down       = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')   
        DE         = pd.concat(Up,Down)
        DE.to_csv('Filtered_set_' + name, sep='\t',index=False)

Any help/suggestions would be great

Upvotes: 0

Views: 202

Answers (1)

Parfait
Parfait

Reputation: 107707

You are attempting to run a data frame level operation on series level method. Do not pass the function in DataFrame.apply (which applies a function on either the rows or columns of a dataframe). Simply call the function as is and pass the whole data frame as a parameter:

path = '/home/pathtofile' 
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('DE')]

def DEfilter(df):
    Up_regulted = df.query('log2FoldChange >= 0.58 and pvalue <= 0.05 and padj <= 0.05')
    Down_regulated = df.query('log2FoldChange <= -0.58 and pvalue <= 0.05 and padj <= 0.05')
    DE = pd.concat([Up_regulted, Down_regulated])
    return DE

for filename in results:
     df = pd.read_csv(filename, sep=sep, header=0)
     DE = DEfilter(df)

     name = os.path.basename(os.path.normpath(filename))
     DE.to_csv('Filtered_set_' + name, sep='\t',index=False)

Upvotes: 2

Related Questions