Numpy/Pandas: Merge two numpy arrays based on one array efficiently

Question

I have two numpy arrays comprised of two-set tuples:

a = [(1, "alpha"), (2, 3), ...]
b = [(1, "zylo"), (1, "xen"), (2, "potato", ...]

The first element in the tuple is the identifier and shared between both arrays, so I want to create a new numpy array which looks like this:

[(1, "alpha", "zylo", "xen"), (2, 3, "potato"), etc...]

My current solution works, but it's way too inefficient for me. Looks like this:

aggregate_collection = []
for tuple_set in a:
  for tuple_set2 in b:
    if tuple_set[0] == tuple_set2[0] and other_condition:
      temp_tup = (tuple_set[0], other tuple values)
      aggregate_collection.append(temp_tup)

How can I do this efficiently?

hpaulj · Accepted Answer

In [278]: a = [(1, "alpha"), (2, 3)]
     ...: b = [(1, "zylo"), (1, "xen"), (2, "potato")]
In [279]: a
Out[279]: [(1, 'alpha'), (2, 3)]
In [280]: b
Out[280]: [(1, 'zylo'), (1, 'xen'), (2, 'potato')]

Note that if I try to make an array from a I get something quite different.

In [281]: np.array(a)
Out[281]: 
array([['1', 'alpha'],
       ['2', '3']], dtype='


defaultdict is a handy tool for collecting like-keyed values
In [283]: from collections import defaultdict
In [284]: dd = defaultdict(list)
In [285]: for tup in a+b:
     ...:     k,v = tup
     ...:     dd[k].append(v)
     ...: 
In [286]: dd
Out[286]: defaultdict(list, {1: ['alpha', 'zylo', 'xen'], 2: [3, 'potato']})

which can be cast as a list of tuples with:
In [288]: [(k,*v) for k,v in dd.items()]
Out[288]: [(1, 'alpha', 'zylo', 'xen'), (2, 3, 'potato')]

I'm using a+b to join the lists, since it apparently doesn't matter where the tuples occur.
Out[288] is even a poor numpy fit, since the tuples differ in size, and items (other than the first) might be strings or numbers.

Numpy/Pandas: Merge two numpy arrays based on one array efficiently

Answers (2)

Related Questions