Reputation: 29
I'm working on a machine vision project. By reflecting laser light on the picture, I detect the pixels that the laser light falls on the picture with the help of Opencv. I keep these pixel values as 2d numpy array. However, I want to make the x, y values unique by determining the pixel values whose x axis values are the same and taking the average of them. Pixel values are kept sequentially in numpy array.
For example:
[[659 253]
[660 253]
[660 256]
[661 253]
[662 253]
[663 253]
[664 253]
[665 253]]
First of all, my goal is to identify all lists in which the first element of each list is the same. When using Opencv, pixel values are kept in numpy arrays to be more useful. I'm trying to write an indexing method myself. I created a numpy array for myself to make it simpler.
x = np.array([[1, 2], [1, 78], [1, 3], [1, 6], [4, 3], [5, 6], [5, 3]], np.int32)
I followed a method like this to find the values whose first element is the same from the lists in the x array.
for i in range (len (x)):
if x [i]! = x [-1] and x [i] [0] == x [i + 1] [0]:
print (x [i], x [i + 1])
I want to check if the first element in the first list is in the next lists by browsing the x array list. In order not to face an index out of range error, I used x [i]! = x [-1]
. I was expecting this loop to return below result to me.
[1,2] [1,78]
[1,78] [1,3]
[1,3] [1,6]
[5,6] [5,3]
I would later remove duplicate elements from the list but I got
ValueError: The truth value of an array with more than one element is ambiguous.Use a.any() or a.all()
I am not familiar with numpy arrays so I could not get the solution I wanted. Is it possible to get the result I want using numpy array methods? Thanks for your time.
Upvotes: 0
Views: 1042
Reputation: 114578
You can use np.unique
with its return_inverse
argument, which is effectively a sorting index, and return_counts
, which is going to help build the split points:
_, ind, cnt = np.unique(x[:, 0], return_index=True, return_counts=True)
The index i
arranges u
into x
. To sort the other way, you need to invert the index. Luckily, np.argsort
is its own inverse:
ind = np.argsort(ind)
To get the splitpoints of the data, you can use np.cumsum
on the count. You don't need the last element because it is always going to mark the end of the array:
spp = np.cumsum(cnt[:-1])
Finally, you can use np.split
to get the list of sub-arrays that you want:
result = np.split(x[ind, :], spp, axis=0)
TL;DR
_, ind, cnt = np.unique(x[:, 0], return_index=True, return_counts=True)
np.split(x[np.argsort(ind), :], np.cumsum(cnt[:-1]), axis=0)
Upvotes: 0
Reputation: 5949
Approach 1
This is a numpy way to do this:
x_sorted = x[np.argsort(x[:,0])]
marker_idx = np.flatnonzero(np.diff(x_sorted[:,0]))+1
output = np.split(x_sorted, marker_idx)
Approach 2
You can also use a package numpy_indexed
which is designed to solve groupby problems with less script and without loss of performance:
import numpy_indexed as npi
npi.group_by(x[:, 0]).split(x)
Approach 3
You can get groups of indices but this might not be the best option because of list comprehension:
import pandas as pd
[x[idx] for idx in pd.DataFrame(x).groupby([0]).indices.values()]
Output
[array([[ 1, 2],
[ 1, 78],
[ 1, 3],
[ 1, 6],
[ 1, 234]]),
array([[4, 3]]),
array([[5, 6],
[5, 3]])]
Upvotes: 2
Reputation: 10624
Try the following, using itertools.groupby:
x.sort(axis=0)
for l in [list([tuple(p) for p in k]) for i,k in itertools.groupby(x, key=lambda x: x[0])]:
print(l)
Output:
[(1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
[(3, 6), (3, 78)]
[(5, 234)]
Upvotes: 0