Losbaltica
Losbaltica

Reputation: 649

Merge 2 different array using List of dictionary in python

I am trying to merge 2 arrays which looks like that:

first:

[650001.88, 300442.2,   18.73,  0.575,  650002.094, 300441.668, 18.775]
[650001.96, 300443.4,   18.7,   0.65,   650002.571, 300443.182, 18.745]
[650002.95, 300442.54,  18.82,  0.473,  650003.056, 300442.085, 18.745]
[650005.28, 300444.76,  18.93,  0.463,  650005.368, 300444.395, 18.659]
[650006.17, 312903.26,  14.68,  0.442,  650006.146, 312902.819, 14.68]
[650006.18, 312902.89,  14.91,  0.243,  650006.146, 312902.819, 14.68]
[650006.17, 300445.16,  18.75,  0.402,  650006.286, 300444.792, 18.635]
[650006.8,  312904.65,  14.54,  0.479,  650006.904, 312905.096, 14.68]
[650006.78, 312905.06,  14.81,  0.184,  650006.904, 312905.096, 14.68]
[650011.84, 300447.74,  18.56,  0.546,  650011.836, 300447.197, 18.507]
[650012.96, 300446.92,  18.71,  0.553,  650013.238, 300446.497, 18.488]
[650014.07, 300447.51,  18.41,  0.614,  650014.2,   300446.914, 18.473]
[650001.18, 312862.23,  8.79,   40.338, 650014.526, 312899.965, 13.797]
[650001.19, 312861.88,  9.15,   40.619, 650014.526, 312899.965, 13.797]

sec:

[300441.668,    1]
[300443.182,    2]
[300442.085,    3]
[300444.395,    4]
[312902.819,    5]
[300444.792,    6]
[312905.096,    7]
[300447.197,    8]
[300446.497,    9]
[300446.914,    10]
[312899.965,    11]

7th column from the first array shares the same arguments as first column from the second array. My first array consist almost 50 millions records and second array have 50.000. I am trying to merge two arrays based on sharing column.

My final array should look like that

715316  650001.88   300442.2    18.73   0.575   650002.094  300441.668  18.775  1
715317  650001.96   300443.4    18.7    0.65    650002.571  300443.182  18.745  2
715310  650002.95   300442.54   18.82   0.473   650003.056  300442.085  18.745  3
715304  650005.28   300444.76   18.93   0.463   650005.368  300444.395  18.659  4
129733  650006.17   312903.26   14.68   0.442   650006.146  312902.819  14.68   5
129739  650006.18   312902.89   14.91   0.243   650006.146  312902.819  14.68   5
715303  650006.17   300445.16   18.75   0.402   650006.286  300444.792  18.635  6
129851  650006.8    312904.65   14.54   0.479   650006.904  312905.096  14.68   7
129852  650006.78   312905.06   14.81   0.184   650006.904  312905.096  14.68   7
715302  650011.84   300447.74   18.56   0.546   650011.836  300447.197  18.507  8
715301  650012.96   300446.92   18.71   0.553   650013.238  300446.497  18.488  9
715250  650014.07   300447.51   18.41   0.614   650014.2    300446.914  18.473  10
129121  650001.18   312862.23   8.79    40.338  650014.526  312899.965  13.797  11
129127  650001.19   312861.88   9.15    40.619  650014.526  312899.965  13.797  11
129128  650001.19   312861.54   9.53    40.897  650014.526  312899.965  13.797  11

I manage to do it but only problem for now is that my d1 dictionary overwriting duplicated keys which ends with incorrect output.

def merge_arrays(first, sec):

    d1 = dict((x[5], x[0:]) for x in first)
    d2 = dict((x[0], x[1:]) for x in sec)

    finaldict = {key:(d2[key], d1[key]) for key in d2}

    arr2 = []

    for x in finaldict.values():
        arr2.append(x)
        #print(x)

    arr = np.asarray(arr2)  
    a = np.array(arr)
    output = np.array(list(map(np.concatenate,a)))

I guessing i need to use List of dictionaries not just normal dictionary. But i don't know how to convert my arrays to List of Dictionary with duplicated key.

EDIT:

I try to use @zipa method:

d2 = dict((x[0], x[1:]) for x in sec)
finaldict = [item + d2[item[5]] for item in first]


print(finaldict[0])


[650001.88, 300442.2,   18.73,  0.575,  650002.094, 300441.668, 18.775]

The reason why is not adding value to the end i guessing is the way my dictionary is created. When i checked d2[item[4]] it creates me [1.] not just 1. I accessing item[4] because in my data it have the same value as item[5] in example.

When i access, it creates this.

Loop

But still is not adding value to my merged array.

Upvotes: 1

Views: 848

Answers (2)

zipa
zipa

Reputation: 27879

Comprehension will do it:

first = [[650001.88, 300442.2,   18.73,  0.575,  650002.094, 300441.668, 18.775],
         [650001.96, 300443.4,   18.7,   0.65,   650002.571, 300443.182, 18.745],
         [650002.95, 300442.54,  18.82,  0.473,  650003.056, 300442.085, 18.745],
         [650005.28, 300444.76,  18.93,  0.463,  650005.368, 300444.395, 18.659],
         [650006.17, 312903.26,  14.68,  0.442,  650006.146, 312902.819, 14.68],
         [650006.18, 312902.89,  14.91,  0.243,  650006.146, 312902.819, 14.68],
         [650006.17, 300445.16,  18.75,  0.402,  650006.286, 300444.792, 18.635],
         [650006.8,  312904.65,  14.54,  0.479,  650006.904, 312905.096, 14.68],
         [650006.78, 312905.06,  14.81,  0.184,  650006.904, 312905.096, 14.68],
         [650011.84, 300447.74,  18.56,  0.546,  650011.836, 300447.197, 18.507],
         [650012.96, 300446.92,  18.71,  0.553,  650013.238, 300446.497, 18.488],
         [650014.07, 300447.51,  18.41,  0.614,  650014.2,   300446.914, 18.473],
         [650001.18, 312862.23,  8.79,   40.338, 650014.526, 312899.965, 13.797],
         [650001.19, 312861.88,  9.15,   40.619, 650014.526, 312899.965, 13.797]]
second = [[300441.668,    1],
          [300443.182,    2],
          [300442.085,    3],
          [300444.395,    4],
          [312902.819,    5],
          [300444.792,    6],
          [312905.096,    7],
          [300447.197,    8],
          [300446.497,    9],
          [300446.914,    10],
          [312899.965,    11]]

second_dict = {i[0]: i[1] for i in second}
first_second = [item + [second_dict[item[5]]] for item in first]
print first_second[0]

Upvotes: 1

Eric Duminil
Eric Duminil

Reputation: 54263

You only need to convert the second array to a dict:

second_list = [[300441.668,    1],
[300443.182,    2],
[300442.085,    3],
[300444.395,    4],
[312902.819,    5],
[300444.792,    6],
[312905.096,    7],
[300447.197,    8],
[300446.497,    9],
[300446.914,    10],
[312899.965,    11]]

print(dict(second_list))
# {312899.965: 11, 300447.197: 8, 300443.182: 2, 300444.792: 6, 300441.668: 1, 300444.395: 4, 300446.497: 9, 312905.096: 7, 312902.819: 5, 300442.085: 3, 300446.914: 10}

It gives you a fast lookup table for the first array. There's no need to convert the first array to anything else. You might want to use dict.get with a default value if a key isn't found.

Upvotes: 1

Related Questions