Reputation: 649
I am trying to merge 2 arrays which looks like that:
first:
[650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775]
[650001.96, 300443.4, 18.7, 0.65, 650002.571, 300443.182, 18.745]
[650002.95, 300442.54, 18.82, 0.473, 650003.056, 300442.085, 18.745]
[650005.28, 300444.76, 18.93, 0.463, 650005.368, 300444.395, 18.659]
[650006.17, 312903.26, 14.68, 0.442, 650006.146, 312902.819, 14.68]
[650006.18, 312902.89, 14.91, 0.243, 650006.146, 312902.819, 14.68]
[650006.17, 300445.16, 18.75, 0.402, 650006.286, 300444.792, 18.635]
[650006.8, 312904.65, 14.54, 0.479, 650006.904, 312905.096, 14.68]
[650006.78, 312905.06, 14.81, 0.184, 650006.904, 312905.096, 14.68]
[650011.84, 300447.74, 18.56, 0.546, 650011.836, 300447.197, 18.507]
[650012.96, 300446.92, 18.71, 0.553, 650013.238, 300446.497, 18.488]
[650014.07, 300447.51, 18.41, 0.614, 650014.2, 300446.914, 18.473]
[650001.18, 312862.23, 8.79, 40.338, 650014.526, 312899.965, 13.797]
[650001.19, 312861.88, 9.15, 40.619, 650014.526, 312899.965, 13.797]
sec:
[300441.668, 1]
[300443.182, 2]
[300442.085, 3]
[300444.395, 4]
[312902.819, 5]
[300444.792, 6]
[312905.096, 7]
[300447.197, 8]
[300446.497, 9]
[300446.914, 10]
[312899.965, 11]
7th column from the first array shares the same arguments as first column from the second array. My first array consist almost 50 millions records and second array have 50.000. I am trying to merge two arrays based on sharing column.
My final array should look like that
715316 650001.88 300442.2 18.73 0.575 650002.094 300441.668 18.775 1
715317 650001.96 300443.4 18.7 0.65 650002.571 300443.182 18.745 2
715310 650002.95 300442.54 18.82 0.473 650003.056 300442.085 18.745 3
715304 650005.28 300444.76 18.93 0.463 650005.368 300444.395 18.659 4
129733 650006.17 312903.26 14.68 0.442 650006.146 312902.819 14.68 5
129739 650006.18 312902.89 14.91 0.243 650006.146 312902.819 14.68 5
715303 650006.17 300445.16 18.75 0.402 650006.286 300444.792 18.635 6
129851 650006.8 312904.65 14.54 0.479 650006.904 312905.096 14.68 7
129852 650006.78 312905.06 14.81 0.184 650006.904 312905.096 14.68 7
715302 650011.84 300447.74 18.56 0.546 650011.836 300447.197 18.507 8
715301 650012.96 300446.92 18.71 0.553 650013.238 300446.497 18.488 9
715250 650014.07 300447.51 18.41 0.614 650014.2 300446.914 18.473 10
129121 650001.18 312862.23 8.79 40.338 650014.526 312899.965 13.797 11
129127 650001.19 312861.88 9.15 40.619 650014.526 312899.965 13.797 11
129128 650001.19 312861.54 9.53 40.897 650014.526 312899.965 13.797 11
I manage to do it but only problem for now is that my d1 dictionary overwriting duplicated keys which ends with incorrect output.
def merge_arrays(first, sec):
d1 = dict((x[5], x[0:]) for x in first)
d2 = dict((x[0], x[1:]) for x in sec)
finaldict = {key:(d2[key], d1[key]) for key in d2}
arr2 = []
for x in finaldict.values():
arr2.append(x)
#print(x)
arr = np.asarray(arr2)
a = np.array(arr)
output = np.array(list(map(np.concatenate,a)))
I guessing i need to use List of dictionaries not just normal dictionary. But i don't know how to convert my arrays to List of Dictionary with duplicated key.
EDIT:
I try to use @zipa method:
d2 = dict((x[0], x[1:]) for x in sec)
finaldict = [item + d2[item[5]] for item in first]
print(finaldict[0])
[650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775]
The reason why is not adding value to the end i guessing is the way my dictionary is created. When i checked d2[item[4]] it creates me [1.] not just 1. I accessing item[4] because in my data it have the same value as item[5] in example.
When i access, it creates this.
But still is not adding value to my merged array.
Upvotes: 1
Views: 848
Reputation: 27879
Comprehension will do it:
first = [[650001.88, 300442.2, 18.73, 0.575, 650002.094, 300441.668, 18.775],
[650001.96, 300443.4, 18.7, 0.65, 650002.571, 300443.182, 18.745],
[650002.95, 300442.54, 18.82, 0.473, 650003.056, 300442.085, 18.745],
[650005.28, 300444.76, 18.93, 0.463, 650005.368, 300444.395, 18.659],
[650006.17, 312903.26, 14.68, 0.442, 650006.146, 312902.819, 14.68],
[650006.18, 312902.89, 14.91, 0.243, 650006.146, 312902.819, 14.68],
[650006.17, 300445.16, 18.75, 0.402, 650006.286, 300444.792, 18.635],
[650006.8, 312904.65, 14.54, 0.479, 650006.904, 312905.096, 14.68],
[650006.78, 312905.06, 14.81, 0.184, 650006.904, 312905.096, 14.68],
[650011.84, 300447.74, 18.56, 0.546, 650011.836, 300447.197, 18.507],
[650012.96, 300446.92, 18.71, 0.553, 650013.238, 300446.497, 18.488],
[650014.07, 300447.51, 18.41, 0.614, 650014.2, 300446.914, 18.473],
[650001.18, 312862.23, 8.79, 40.338, 650014.526, 312899.965, 13.797],
[650001.19, 312861.88, 9.15, 40.619, 650014.526, 312899.965, 13.797]]
second = [[300441.668, 1],
[300443.182, 2],
[300442.085, 3],
[300444.395, 4],
[312902.819, 5],
[300444.792, 6],
[312905.096, 7],
[300447.197, 8],
[300446.497, 9],
[300446.914, 10],
[312899.965, 11]]
second_dict = {i[0]: i[1] for i in second}
first_second = [item + [second_dict[item[5]]] for item in first]
print first_second[0]
Upvotes: 1
Reputation: 54263
You only need to convert the second array to a dict:
second_list = [[300441.668, 1],
[300443.182, 2],
[300442.085, 3],
[300444.395, 4],
[312902.819, 5],
[300444.792, 6],
[312905.096, 7],
[300447.197, 8],
[300446.497, 9],
[300446.914, 10],
[312899.965, 11]]
print(dict(second_list))
# {312899.965: 11, 300447.197: 8, 300443.182: 2, 300444.792: 6, 300441.668: 1, 300444.395: 4, 300446.497: 9, 312905.096: 7, 312902.819: 5, 300442.085: 3, 300446.914: 10}
It gives you a fast lookup table for the first array. There's no need to convert the first array to anything else. You might want to use dict.get
with a default value if a key isn't found.
Upvotes: 1