Jason K. George
Jason K. George

Reputation: 63

Python: Filtering numpy values based on certain columns

I'm trying to create a method for evaluating co-ordinates for a project that's due in about a week.

Assuming that I'm working in a 3D cartesian co-ordinate system - whose values are stored as rows in a numpy array. I am trying to read if 'z' (n[i, 2]) values exist given the corresponding, predetermined 'x' (n[i,0]) and 'y' (n[i,1]) values.

In the case where the values that are assigned are scalars, I am content to think that:

# Given that n is some numpy array
x, y = 2,3 
out = []
for i in range(0,n.shape[0]):
 if n[i, 0] == x and n[i,1] == y:
  out.append(n[i,2])

However, where the sorrow comes in is having to check if the values in another numpy array are in the original numpy array 'n'.

# Given that n is the numpy array that is to be searched
# Given that x contains the 'search elements'
out = []
for i in range(0,n.shape[0]):
 for j in range(0, x.shape[0]):
  if n[i, 0] == x[j,0] and n[i,1] == x[j,1]:
   out.append(n[i,2])

The issue with doing it this way is that the 'n' matrix in my application may well be in excess of 100 000 lines long.

Is there a more efficient way of performing this function?

Upvotes: 2

Views: 1095

Answers (2)

Deepak Saini
Deepak Saini

Reputation: 2910

Numpythonic solution without loops.

This solution works in case the x and y coordinates are non-negative.

import numpy as np
# Using a for x and b for n, to avoid confusion with x,y coordinates and array names
a = np.array([[1,2],[3,4]])
b = np.array([[1,2,10],[1,2,11],[3,4,12],[5,6,13],[3,4,14]])

# Adjust the shapes by taking the z coordinate as 0 in a and take the dot product with b transposed
a = np.insert(a,2,0,axis=1)
dot_product = np.dot(a,b.T)

# Reshape a**2 to check the dot product values corresponding to exact values in the x, y coordinates
sum_reshaped = np.sum(a**2,axis=1).reshape(a.shape[0],1)

# Match for values for indivisual elements in a. Can be used if you want z coordinates corresponding to some x, y separately
indivisual_indices = ( dot_product == np.tile(sum_reshaped,b.shape[0]) )

# Take OR of column values and take z if atleast one x,y present
indices  = np.any(indivisual_indices, axis=0)
print(b[:,2][indices]) # prints [10 11 12 14]

Upvotes: 0

today
today

Reputation: 33420

This might be more efficient than nested loops:

out = []
for row in x:
    idx = np.equal(n[:,:2], row).all(1)
    out.extend(n[idx,2].tolist())

Note this assumes that x is of shape (?, 2). Otherwise, if it has more than two columns, just change row to row[:2] in the loop body.

Upvotes: 1

Related Questions