lawson
lawson

Reputation: 377

Comparing pandas columns with long list of tuples

Suppose we have a long list of tuples consisting of coordinates:

coords = 
[(61.0, 73, 94.0, 110.0),
(61.0, 110.0, 94.0, 148.0), 
(61.0, 148.0, 94.0, 202.0), 
(61.0, 202.0, 94.0, 241.0).......]

Our dataframe has mutiple columns including 'left, top, left1, top1' which correspond to coordinates.

    left    top     left1   top1
0   398     57.0    588     86
1   335     122.0   644     145
2   414     150.0   435     167
3   435     150.0   444     164
4   444     150.0   571     167
...     ...     ...     ...     ...

I'm wanting to check which rows fall within these coordinates. I'm currently doing this one tuple at a time as shown below but this is very slow.

for coord in coords:
    result = df.loc[(df['left']>(coord[0])) &
                    (df['left1']<=(coord[2])) & 
                    (df['top1']>(coord[1])) & 
                    (df['top']<(coord[3]))]

Is there a quicker way to do this?

Many thanks to all contributors in advance!

Upvotes: 5

Views: 205

Answers (2)

BENY
BENY

Reputation: 323316

We can do numpy broadcast

df=df.reindex(columns=['left','top1','left1','top'])
a=np.sign(df.values - np.array(coords)[:, None]) 
idx=np.any(np.all(np.sign(a)==np.array([1,1,-1,-1]),2),0) |np.any(np.all(np.sign(a)==np.array([1,1,0,-1]),2),0)
idx
Out[237]: array([ True, False, False, False, False])

df_sub=df[idx]

For example

df
Out[231]: 
    left    top1  left1    top
0  10000  100000      0    0.0
1    335     145    644  122.0
2    414     167    435  150.0
3    435     164    444  150.0
4    444     167    571  150.0

Your out put

for coord in coords:
    result = df.loc[(df['left']>(coord[0])) &
                    (df['left1']<=(coord[2])) & 
                    (df['top1']>(coord[1]))
                    & 
                    (df['top']<(coord[3]))]

result
Out[228]: 
    left  top  left1    top1
0  10000  0.0      0  100000

My out put

df_sub = df[idx]
print(df_sub)
    left    top1  left1  top
0  10000  100000      0  0.0

Upvotes: 1

Dave
Dave

Reputation: 2049

One way to do this is a cross join, followed by a filter:

coord_df = pd.DataFrame(coords, columns=['left', 'top', 'left1', 'top1'])

# We need to assign dummies for the cross join
coord_df['dummy'] = 1
df['dummy'] = 1

both_rects = df.merge(coord_df, key='dummy', suffixes=('_inner', '_outer'))

Then you're back to filtering by your original expression:

rects_within = both_rects.loc[(both_rects.left_inner > both_rects.left_outer) &
                              (both_rects.top_inner > both_rects.left_outer) &
                              (both_rects.left1_inner <= both_rects.left1_outer) &
                              (both_rects.top1_inner <= both_rects.top1_outer)]

Upvotes: 1

Related Questions