Reputation: 1272
I'm trying to return the rows which qualify a certain condition in a for
loop, into a new dataframe. So in short, I'd like to see row 4 (result for loop) in a new df. In addition, no duplicate rows. So, if 3 through 5 would be True, then every row should be returned once.
np.random.seed(0)
N = 5
df = pd.DataFrame(
{'X':np.random.uniform(-5,5,N),
'Y':np.random.uniform(-5,5,N),
'Z':np.random.uniform(-5,5,N),
})
df
for i in range(len(df)):
for k in range( i+1, len(df)+1 ):
df_sum = df[i:k].sum()
print( "rows", i, "to", k, (df_sum>-5).all() & (df_sum.sum()<1 ))
I'd like to return this;
Upvotes: 0
Views: 1042
Reputation: 862901
Another solution is create helper df
and then use isin
:
df = pd.DataFrame(
{'X':np.random.uniform(-5,5,N),
'Y':np.random.uniform(-5,5,N),
'Z':np.random.uniform(-5,5,N),
})
print (df)
arr = []
for i in range(len(df)):
for k in range( i+1, len(df)+1 ):
df_sum = df[i:k].sum()
val = (df_sum>-5).all() & (df_sum.sum()<1 )
print( "rows", i, "to", k, val)
arr.append([i,val])
#print (arr)
df1 = pd.DataFrame(arr, columns=['ROWS','MASK']).set_index('ROWS')
idx = df1[df1.MASK].index
print (idx)
Int64Index([4], dtype='int64', name='ROWS')
print (df[df.index.isin(idx)])
X Y Z
4 -0.763452 -1.165585 -4.289639
All together with multiple DataFrames
:
np.random.seed(0)
N = 5
df1 = pd.DataFrame(
{'X':np.random.uniform(-5,5,N),
'Y':np.random.uniform(-5,5,N),
'Z':np.random.uniform(-5,5,N),
})
N = 6
df2 = pd.DataFrame(
{'X':np.random.uniform(-5,5,N),
'Y':np.random.uniform(-5,5,N),
'Z':np.random.uniform(-5,5,N),
})
N = 7
df3 = pd.DataFrame(
{'X':np.random.uniform(-5,5,N),
'Y':np.random.uniform(-5,5,N),
'Z':np.random.uniform(-5,5,N),
})
L = [df1, df2, df3]
print (L)
dfs = {}
for j, df in enumerate(L):
arr = []
for i in range(len(df)):
for k in range( i+1, len(df)+1 ):
df_sum = df[i:k].sum()
val = (df_sum>-5).all() & (df_sum.sum()<1 )
#print( "rows", i, "to", k, val)
arr.append([i,val])
df1 = pd.DataFrame(arr, columns=['ROWS','MASK']).set_index('ROWS')
idx = df1[df1.MASK].index
#print (df[df.index.isin(idx)])
dfs['df' + str(j + 1)] = df[df.index.isin(idx)]
print (dfs)
{'df1': X Y Z
4 -0.763452 -1.165585 -4.289639, 'df2': X Y Z
0 -4.128707 2.991586 4.446689
1 -4.797816 -0.385206 0.218483
3 2.781568 -3.817256 -2.354444
5 4.786183 -3.566467 -0.438497, 'df3': X Y Z
0 0.684339 -1.404921 -3.710737
1 -4.812102 -0.629680 -1.845716
2 1.176355 1.976312 -1.362892
3 1.120957 -4.397745 0.701968
6 1.818203 -2.896174 -3.979552}
print (dfs['df1'])
X Y Z
4 -0.763452 -1.165585 -4.289639
Upvotes: 1
Reputation: 2978
Try this ..
df_result = df[0:0] # new 1
for i in range(len(df)):
for k in range( i+1, len(df)+1 ):
df_sum = df[i:k].sum()
print( "rows", i, "to", k, (df_sum>-5).all() & (df_sum.sum()<1 ))
if ((df_sum>-5).all() & (df_sum.sum()<1 )): # new 2
df_result = df_result.append(df[i:k]) # new 3
df_result.drop_duplicates() # new 4
The first new line creates an empty DataFrame but preserving the column structure of the original. The second and third new lines append the rows to the result DataFrame if they meet your condition. The final new line removes the duplicates.
Also, note that this solution is not the most performant solution because appending rows to a DataFrame is inefficient. In case performance becomes an issue, you may want to convert df_result to a dictionary and convert it to a DataFrame at the very end.
Upvotes: 1