Returning rows from a dataframe evaluated by a for loop

Question

I'm trying to return the rows which qualify a certain condition in a for loop, into a new dataframe. So in short, I'd like to see row 4 (result for loop) in a new df. In addition, no duplicate rows. So, if 3 through 5 would be True, then every row should be returned once.

np.random.seed(0)
N = 5

df = pd.DataFrame(
    {'X':np.random.uniform(-5,5,N),
     'Y':np.random.uniform(-5,5,N),
     'Z':np.random.uniform(-5,5,N),
    })
df

for i in range(len(df)):
    for k in range( i+1, len(df)+1 ):
        df_sum = df[i:k].sum()
        print( "rows", i, "to", k, (df_sum>-5).all() & (df_sum.sum()<1 ))

I'd like to return this;

jezrael · Accepted Answer

Another solution is create helper df and then use isin:

df = pd.DataFrame(
    {'X':np.random.uniform(-5,5,N),
     'Y':np.random.uniform(-5,5,N),
     'Z':np.random.uniform(-5,5,N),
    })
print (df)

arr = []
for i in range(len(df)):
    for k in range( i+1, len(df)+1 ):
        df_sum = df[i:k].sum()
        val = (df_sum>-5).all() & (df_sum.sum()<1 )
        print( "rows", i, "to", k, val)
        arr.append([i,val])
#print (arr)

df1 = pd.DataFrame(arr, columns=['ROWS','MASK']).set_index('ROWS')
idx = df1[df1.MASK].index

print (idx)
Int64Index([4], dtype='int64', name='ROWS')

print (df[df.index.isin(idx)])
          X         Y         Z
4 -0.763452 -1.165585 -4.289639

All together with multiple DataFrames:

np.random.seed(0)
N = 5
df1 = pd.DataFrame(
    {'X':np.random.uniform(-5,5,N),
     'Y':np.random.uniform(-5,5,N),
     'Z':np.random.uniform(-5,5,N),
    })
N = 6
df2 = pd.DataFrame(
    {'X':np.random.uniform(-5,5,N),
     'Y':np.random.uniform(-5,5,N),
     'Z':np.random.uniform(-5,5,N),
    })
N = 7
df3 = pd.DataFrame(
    {'X':np.random.uniform(-5,5,N),
     'Y':np.random.uniform(-5,5,N),
     'Z':np.random.uniform(-5,5,N),
    })

L = [df1, df2, df3]
print (L)

dfs = {}
for j, df in enumerate(L):
    arr = []
    for i in range(len(df)):
        for k in range( i+1, len(df)+1 ):
            df_sum = df[i:k].sum()
            val = (df_sum>-5).all() & (df_sum.sum()<1 )
            #print( "rows", i, "to", k, val)
            arr.append([i,val])
    df1 = pd.DataFrame(arr, columns=['ROWS','MASK']).set_index('ROWS')
    idx = df1[df1.MASK].index
    #print (df[df.index.isin(idx)])
    dfs['df' + str(j + 1)] = df[df.index.isin(idx)]

print (dfs)
{'df1':           X         Y         Z
4 -0.763452 -1.165585 -4.289639, 'df2':           X         Y         Z
0 -4.128707  2.991586  4.446689
1 -4.797816 -0.385206  0.218483
3  2.781568 -3.817256 -2.354444
5  4.786183 -3.566467 -0.438497, 'df3':           X         Y         Z
0  0.684339 -1.404921 -3.710737
1 -4.812102 -0.629680 -1.845716
2  1.176355  1.976312 -1.362892
3  1.120957 -4.397745  0.701968
6  1.818203 -2.896174 -3.979552}

print (dfs['df1'])
          X         Y         Z
4 -0.763452 -1.165585 -4.289639

Returning rows from a dataframe evaluated by a for loop

Answers (2)

Related Questions