Reputation: 385
In my case I have 1002 traces (in columns) of each 2560 samples (the rows). I want to select a subset of traces for which the sample maximum is situated between two values.
filter1=(CCC.max()>=11850)&(CCC.max()<=12150)
DDD=CCC.where(filter1,axis=0)
EEE=DDD.dropna(axis=0, how='all')
I hoped that EEE contained just the traces as explained above. When calculating the maximum value per trace in EEE I get again all maxima.
From what I obtain I have the impression that the where filter is only a sort of a mask on the complete dataframe and not a real subset of the dataframe.
How can I make a real subset by take these columns where the boolean parameter True is.
----> On request from lostCode an example clarifying my question:
data=([[0.5,1.0,2.0,3.0,0.7,5.0,4.0],
[7.0,8.0,9.0,0.1,6.0,7.0,4.0],
[1.0,2.0,3.0,0.4,5.0,6.0,7.0],
[9.0,0.8,7.0,0.6,8.0,7.0,2.0]])
filter1=Data.max()<=8.0
Data=pd.DataFrame(data)
RRR=Data.where(filter1)
RRR.dropna()
The data array contains 4 rows being my traces of each 7 samples. The maxima per row are: row 0 5.0 row 1 9.0 row 2 7.0 row 3 9.0
Rows with max <=8.0 are row 0 and row 2.
I just want to copy the 2 rows in a dataframe I can then use for processing.
Upvotes: 0
Views: 169
Reputation: 385
When trying to clarify my question I found that I should have inserted axis=1 in order to work on rows. By doing this I become the desired set of rows respecting the constraint. The code became:
data=([[0.5,1.0,2.0,3.0,0.7,5.0,4.0],
[7.0,8.0,9.0,0.1,6.0,7.0,4.0],
[1.0,2.0,3.0,0.4,5.0,6.0,7.0],
[9.0,0.8,7.0,0.6,8.0,7.0,2.0]])
Data=pd.DataFrame(data)
filter1=Data.max(axis=1)<=8.0
RRR=Data[filter1]
RRR
Upvotes: 1
Reputation: 30930
import pandas as pd
import numpy as np
data=([[0.5,1,2,3,0.7,5,4],
[7,8,9,0.1,6,7,4],
[1,2,3,0.4,5,6,7],
[9,0.8,7,0.6,8,7,2]])
Data=pd.DataFrame(data,columns='col1 col2 col3 col4 col5 col6 col7'.split())
RRR=pd.DataFrame(columns='col1 col2 col3 col4 col5 col6 col7'.split())
print(Data)
print('-'*100)
i=0
j=0
#this selects the rows that meet the condition and save it in RRR
while i<len(Data.index):
if Data.loc[Data.index.values[i]].max()>8:
RRR.loc[j]=Data.loc[Data.index.values[i]]
j+=1
i+=1
print(RRR)
print('-'*100)
print( 'max of colum 0: ',RRR.loc[0].max())
print('-'*100)
print( 'max of colum 1: ',RRR.loc[1].max())
Output:
col1 col2 col3 col4 col5 col6 col7
0 0.5 1.0 2 3.0 0.7 5 4
1 7.0 8.0 9 0.1 6.0 7 4
2 1.0 2.0 3 0.4 5.0 6 7
3 9.0 0.8 7 0.6 8.0 7 2
----------------------------------------------------------------------------------------------------
col1 col2 col3 col4 col5 col6 col7
0 7.0 8.0 9.0 0.1 6.0 7.0 4.0
1 9.0 0.8 7.0 0.6 8.0 7.0 2.0
----------------------------------------------------------------------------------------------------
max of colum 0: 9.0
----------------------------------------------------------------------------------------------------
max of colum 1: 9.0
Upvotes: 1