Jennifer Murguia
Jennifer Murguia

Reputation: 11

merge arrays in python based on a similar value

I want to merge two arrays in python based on the first element in each column of each array.

For example,

A = ([[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8], 
[5, 9, 1]])

B = ([[1, .002],
[4, .005],
[5, .006]])

So that I get an array

C = ([[1, 2, 3, .002],
[4, 5, 6, .005],
[4, 6, 7, .005],
[5, 7, 8, .006],
[5, 9, 1, .006]])

For more clarity:

First column in A is 1, 4, 4, 5, 5 and First column of B is 1, 4, 5

So that 1 in A matches up with 1 in B and gets .002

How would I do this in python? Any suggestions would be great.

Upvotes: 1

Views: 385

Answers (5)

mattbasta
mattbasta

Reputation: 13709

The naive, simple way:

for alist in A:
    for blist in B:
        if blist[0] == alist[0]:
            alist.extend(blist[1:])
            # alist.append(blist[1]) if B will only ever contain 2-tuples.
            break  # Remove this if you want to append more than one.

The downside here is that it's O(N^2) complexity. For most small data sets, that should be ok. If you're looking for something more comprehensive, you'll probably want to look at @mgilson's answer. Some comparison:

  1. His response converts everything in B to a dict and performs list slicing on each element. If you have a lot of values in B, that could be expensive. This uses the existing lists (you're only looking at the first value, anyway).
  2. Because he's using dicts, he gets O(1) lookup times (his answer also assumes that you're never going to append multiple values to the end of the values in A). That means overall, his algorithm will achieve O(N). You'll need to weigh whether the overhead of creating a dict is going to outweight the iteration of the values in B.

Upvotes: 0

pemistahl
pemistahl

Reputation: 9584

Here is a solution using itertools.product() that prevents having to create a dictionary for B:

In [1]: from itertools import product

In [2]: [lst_a + lst_b[1:] for (lst_a, lst_b) in product(A, B) if lst_a[0] == lst_b[0]]
Out[2]:
[[1, 2, 3, 0.002],
 [4, 5, 6, 0.005],
 [4, 6, 7, 0.005],
 [5, 7, 8, 0.006],
 [5, 9, 1, 0.006]]

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250981

You can convert B to a dictionary first, with the first element of each sublist as key and second one as value.

Then simply iterate over A and append the related value fetched from the dict.

In [114]: A = ([1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8], 
[6, 9, 1])

In [115]: B = ([1, .002],
[4, .005],
[5, .006])

In [116]: [x + [dic[x[0]]] if x[0] in dic else []  for x in A]
Out[116]: 
[[1, 2, 3, 0.002],
 [4, 5, 6, 0.005],
 [4, 6, 7, 0.005],
 [5, 7, 8, 0.006],
 [6, 9, 1]]

Upvotes: 0

mgilson
mgilson

Reputation: 309929

Is it Ok to modify A in place?:

d = dict((x[0],x[1:]) for x in B)

Now d is a dictionary where the first column are keys and the subsequent columns are values.

for lst in A:
    if lst[0] in d: #Is the first value something that we can extend?
        lst.extend(d[lst[0]])

print A

To do it out of place (inspired by the answer by Ashwini):

d = dict((x[0],x[1:]) for x in B)
C = [lst + d.get(lst[0],[]) for lst in A]

However, with this approach, you need to have lists in both A and B. If you have some lists and some tuples it'll fail (although it could be worked around if you needed to), but it will complicate the code slightly.

with either of these answers, B can have an arbitrary number of columns

As a side note on style: I would write the lists as:

A = [[1, 2, 3],
     [4, 5, 6],
     [4, 6, 7],
     [5, 7, 8], 
     [5, 9, 1]]

Where I've dropped the parenthesis ... They make it look too much like you're putting a list in a tuple. Python's automatic line continuation happens with parenthesis (), square brackets [] or braces {}.

Upvotes: 1

Jason Orendorff
Jason Orendorff

Reputation: 45096

(This answer assumes these are just regular lists. If they’re NumPy arrays, you have more options.)

It looks like you want to use B as a lookup table to find values to add to each row of A.

I would start by making a dictionary out of the data in B. As it happens, B is already in just the right form to be passed to the dict() builtin:

B_dict = dict(B)

Then you just need to build C row by row.

For each row in A, row[0] is the first element, so B_dict[row[0]] is the value you want to add to the end of the row. Therefore row + [B_dict[row[0]] is the row you want to add to C.

Here is a list comprehension that builds C from A and B_dict.

C = [row + [B_dict[row[0]]] for row in A]

Upvotes: 0

Related Questions