Reputation: 2584
In python how would I do this:
say I have:
a = [[1, 5], [2,6], [3,3], [4,2]]
b= [[3, 1], [4,2], [1,8], [2,4]]
Now I want to do an operation with the second column values IF the first column values match.
E.G.
a has an entry [1,5], now go through b to see oh it has a value [1,8], now I want to divide 5/8 and store that value into say array c. Next would be matching [2,6] and [2,4] and getting the next value in c: 6/4.
so:
c = [5/8, 6/4, 3/1, 2/2]
Given the above example. I hope this makes sense. Would like to this with numpy and python.
Upvotes: 2
Views: 198
Reputation: 25813
I humbly propose that you're using the wrong data structure. Notice that if you have an array column that has unique values between 1 and N (an index column) you could encode the same data simply by re-ordering your other columns. Once you're re-ordered your data, not only can you drop the "index" column but now it becomes easier to operate on the remaining data. Let me demonstrate:
import numpy as np
N = 5
a = np.array([[1, 5], [2,6], [3,3], [4,2]])
b = np.array([[3, 1], [4,2], [1,8], [2,4]])
a_trans = np.ones(N)
a_trans[a[:, 0]] = a[:, 1]
b_trans = np.ones(N)
b_trans[b[:, 0]] = b[:, 1]
c = a_trans / b_trans
print c
Depending on the nature of your problem, you can sometimes use an implicit index from the beginning, but sometimes an explicit index can be very useful. If you need an explicit index, consider using something like pandas.DataFrame
with better support for index operations.
Upvotes: 0
Reputation: 221524
You can use np.searchsorted
to get the positions where b
's first column elements correspond to the a
's first column elements and using that get the respective second column elements for division and finally get c
. Thus, assuming a
and b
to be NumPy arrays, the vectorized implementation would be -
a0 = a[:,0]
c = np.true_divide(a[:,1],b[np.searchsorted(a0,b[:,0],sorter=a0.argsort()),1])
The approach listed above works for a generic case when the first column elements of a
are not necessarily sorted. But, if they are sorted just like for the listed sample case, you can simply ignore the sorter
input argument and have a simplified solution, like so -
c = np.true_divide(a[:,1],b[np.searchsorted(a0,b[:,0]),1])
Sample run -
In [35]: a
Out[35]:
array([[1, 5],
[2, 6],
[3, 3],
[4, 2]])
In [36]: b
Out[36]:
array([[3, 1],
[4, 2],
[1, 8],
[2, 4]])
In [37]: a0 = a[:,0]
In [38]: np.true_divide(a[:,1],b[np.searchsorted(a0,b[:,0],sorter=a0.argsort()),1])
Out[38]: array([ 0.625, 1.5 , 3. , 1. ])
Upvotes: 4
Reputation: 11602
Given all of the assumptions in the comment section, this will work:
from operator import itemgetter
from __future__ import division
a = [[1, 5], [2,6], [3,3], [4,2]]
b = [[3, 1], [4,2], [1,8], [2,4]]
result = [x / y for (_, x), (_, y) in zip(a, sorted(b, key=itemgetter(0)))]
Assumptions: lists have equal lengths, elements in the first position are unique for each list, first list is sorted by first element, every element that occurs in the first position in a
also occurs in the first position in b
.
Upvotes: 4
Reputation: 2073
You can use a simple O(n^2)
way with nested loops:
c = []
for x in a:
for y in b:
if x[0] == y[0]:
c.append(x[1]/y[1])
break
The above is useful when the lists are small. For huge lists, consider a dictionary based approach, where the complexity would be O(n) at the cost of some extra space.
Upvotes: 1