noste99
noste99

Reputation: 385

Pandas Boolean Where Filtering: How can I use it for creating a real subset of the data?

In my case I have 1002 traces (in columns) of each 2560 samples (the rows). I want to select a subset of traces for which the sample maximum is situated between two values.

    filter1=(CCC.max()>=11850)&(CCC.max()<=12150)
    DDD=CCC.where(filter1,axis=0)
    EEE=DDD.dropna(axis=0, how='all')

I hoped that EEE contained just the traces as explained above. When calculating the maximum value per trace in EEE I get again all maxima.

From what I obtain I have the impression that the where filter is only a sort of a mask on the complete dataframe and not a real subset of the dataframe.

How can I make a real subset by take these columns where the boolean parameter True is.

----> On request from lostCode an example clarifying my question:

data=([[0.5,1.0,2.0,3.0,0.7,5.0,4.0],
              [7.0,8.0,9.0,0.1,6.0,7.0,4.0],
              [1.0,2.0,3.0,0.4,5.0,6.0,7.0],
              [9.0,0.8,7.0,0.6,8.0,7.0,2.0]])
filter1=Data.max()<=8.0
Data=pd.DataFrame(data)
RRR=Data.where(filter1)
RRR.dropna()

The data array contains 4 rows being my traces of each 7 samples. The maxima per row are: row 0 5.0 row 1 9.0 row 2 7.0 row 3 9.0

Rows with max <=8.0 are row 0 and row 2.

I just want to copy the 2 rows in a dataframe I can then use for processing.

Upvotes: 0

Views: 169

Answers (2)

noste99
noste99

Reputation: 385

When trying to clarify my question I found that I should have inserted axis=1 in order to work on rows. By doing this I become the desired set of rows respecting the constraint. The code became:

data=([[0.5,1.0,2.0,3.0,0.7,5.0,4.0],
              [7.0,8.0,9.0,0.1,6.0,7.0,4.0],
              [1.0,2.0,3.0,0.4,5.0,6.0,7.0],
              [9.0,0.8,7.0,0.6,8.0,7.0,2.0]])
Data=pd.DataFrame(data)
filter1=Data.max(axis=1)<=8.0
RRR=Data[filter1]
RRR

Upvotes: 1

ansev
ansev

Reputation: 30930

import pandas as pd
import numpy as np
data=([[0.5,1,2,3,0.7,5,4],
              [7,8,9,0.1,6,7,4],
              [1,2,3,0.4,5,6,7],
              [9,0.8,7,0.6,8,7,2]])
Data=pd.DataFrame(data,columns='col1 col2 col3 col4 col5 col6 col7'.split())
RRR=pd.DataFrame(columns='col1 col2 col3 col4 col5 col6 col7'.split())
print(Data)
print('-'*100)
i=0
j=0
#this selects the rows that meet the condition and save it in RRR
while i<len(Data.index):
    if Data.loc[Data.index.values[i]].max()>8:
        RRR.loc[j]=Data.loc[Data.index.values[i]]
        j+=1
    i+=1
print(RRR)
print('-'*100)
print( 'max of colum 0: ',RRR.loc[0].max())
print('-'*100)
print( 'max of colum 1: ',RRR.loc[1].max())

Output:

   col1  col2  col3  col4  col5  col6  col7
0   0.5   1.0     2   3.0   0.7     5     4
1   7.0   8.0     9   0.1   6.0     7     4
2   1.0   2.0     3   0.4   5.0     6     7
3   9.0   0.8     7   0.6   8.0     7     2
----------------------------------------------------------------------------------------------------
   col1  col2  col3  col4  col5  col6  col7
0   7.0   8.0   9.0   0.1   6.0   7.0   4.0
1   9.0   0.8   7.0   0.6   8.0   7.0   2.0
----------------------------------------------------------------------------------------------------
max of colum 0:  9.0
----------------------------------------------------------------------------------------------------
max of colum 1:  9.0

Upvotes: 1

Related Questions