Pukki
Pukki

Reputation: 103

Make a new dictionary from keys of one of the two dictionaries being compared, Python

I have two dictionaries with coordinates:

vertex_coordinates = {0: [x0,y0,z0], 1: [x1,y1,z1], 2: [x2,y2,z2] ...}
element_coordinates = {0: [X0,Y0,Z0], 2: [X2,Y2,Z2], 7: [X3,Y3,Z3] ...}

The keys of the first dictionary are simply 0:N while the keys of the second dictionary are sorted, but not necessarily consequent. The second dictionary is actually much larger than the first, so one particular case is

len(vertex_coordinates) = 729
len(element_coordinates) = 58752

What I want is a dictionary in which the key represents the key of the first dictionary and the value associated with this key is a list of keys from the second dictionary such that the coordinates are equal. For example, let

vertex_coordinates = {0: [1.0,1.0,1.0], 1: [0.0,0.0,0.0], 2: [3.0,4.0,5.0], 3: [3.0, 6.0, 7.0]}
element_coordinates = {0: [0.0,0.0,0.0], 1: [3.0,4.0,5.0], 3: [3.0,6.0,7.0], \
   4: [1.0,1.0,1.0], 6: [0.0,0.0,0.0], 7: [3.0,4.0,5.0], 8:[1.0,1.0,1.0] \
   10: [3.0,6.0,7.0]}

Then, the desired dictionary is

element_to_vertex = {0: [4,8], 1: [0,6], 2: [1,7], 3: [3,10]}

It may or may not be important but the structure of my data is such that there will be no keys from dictionary 2 left in the end of this process, they all will end up in resulting dictionary, i.e. set of values of dict2 is equal to the set values of dict1.

The way I implemented it is:

for vertex in vertex_coordinates:
  temp = []
  for elem in element_coordinates:
    if(near(element_coordinates[elem][0], vertex_coordinates[vertex][0])):
      if(near(element_coordinates[elem][1], vertex_coordinates[vertex][1])):
        if(near(element_coordinates[elem][2], vertex_coordinates[vertex][2])):
          temp.append(elem)

  element_to_vertex[vertex] = temp

While this works fine, it is very slow: on the example with lengths of dictionaries being 729 and 58752 it takes about 25 seconds to run, and these lengths are not the largest I am interested in using. Could you please tell me if it is possible to speed it up or should I think of another way around this problem? Thank you.

Upvotes: 3

Views: 117

Answers (3)

Dunes
Dunes

Reputation: 40683

You may wish to reconsider how you store your data. You could use a numpy array to store your vertex coordinates and a scipy sparse matrix to store your element coordinates. You'll keep the space efficiency, but also gain efficient ways to manipulate your data.

from scipy.sparse import coo_matrix
from itertools import chain
import numpy as np

# input as specified
vertex_coordinates = {0: [1.0,1.0,1.0], 1: [0.0,0.0,0.0], 2: [3.0,4.0,5.0], 3: [3.0, 6.0, 7.0]}
element_coordinates = {0: [0.0,0.0,0.00000001], 1: [3.0,4.0,5.0], 3: [3.0,6.0,7.0], \
   4: [1.0,1.0,1.0], 6: [0.0,0.0,0.0], 7: [3.0,4.0,5.0], 8:[1.0,1.0,1.0], \
   10: [3.0,6.0,7.0]}

# conversion to numpy array and sparse array
vertex_coordinates = np.array(list(vertex_coordinates.values()), dtype=float)
rows = list(chain.from_iterable([i] * 3 for i in element_coordinates))
cols = list(range(3)) * len(element_coordinates)
data = list(chain.from_iterable(element_coordinates.values()))
element_coordinates = coo_matrix((data, (rows, cols)))
del rows, cols, data

# create output
num_cols = vertex_coordinates.shape[1] # 3
num_rows = len(element_coordinates.row) // num_cols # 8 in this case
shape = num_rows, num_cols

element_to_vertex = {}
# data and row are flat arrays, reshape array to have 3 columns
data_view = element_coordinates.data.reshape(shape)
row_indices = element_coordinates.row[::num_cols]
for i, row in enumerate(vertex_coordinates):
    # compare each row in element_coordinates to see if there is any match
    matches = np.isclose(row, data_view)
    # keep only the rows that completely matched
    row_matches = matches.all(axis=1)
    if row_matches.any():
        # if at least one row matched then get their indices 
        indices = row_indices[row_matches]
        element_to_vertex[i] = indices.tolist()

print(element_to_vertex)
# prints {0: [4, 8], 1: [0, 6], 2: [1, 7], 3: [3, 10]}

This should speed up your program, but without being able to know the full structure of your data I may have made assumptions that aren't necessarily true.

Upvotes: 1

Kassym Dorsel
Kassym Dorsel

Reputation: 4843

Currently you are iterating over element_coordinates for each entry in vertex_coordinates. This is, as you see, quite slow.

Why not make a new dictionary which is the inverse of element_coordinates : {(1.0,1.0,1.0):[4, 8], ...}. This way you only iterate over it once and then just do fast look ups.

There is one catch with this (Thanks @Lukas Graf). Floats don't always compare correctly and this might not work. If the coordinates are calculated there may be rounding errors and look up will not work as expected. That is why you are using the near method in your question. You can look into bigdecimal for a potential fix. If the data is relatively clean or is set there should be no issue.

Doing it this way you will only iterate over each dictionary once. Instead of being O(n^2) it becomes O(n). This way uses more memory, but you have to choose one or the other.

You would do something like this:

from collections import defaultdict
vertex_coordinates = {0: [1.0,1.0,1.0], 1: [0.0,0.0,0.0], 2: [3.0,4.0,5.0], 3: [3.0, 6.0, 7.0]}
element_coordinates = {0: [0.0,0.0,0.0], 1: [3.0,4.0,5.0], 3: [3.0,6.0,7.0], 4: [1.0,1.0,1.0], 6: [0.0,0.0,0.0], 7: [3.0,4.0,5.0], 8:[1.0,1.0,1.0], 10: [3.0,6.0,7.0]}

inv_el_coords = defaultdict(list)

for k, v in element_coordinates.items():
    inv_el_coords[tuple(v)].append(k)

element_to_vertex = {k:inv_el_coords[tuple(v)] for k,v in vertex_coordinates.items()}

print(element_to_vertex)

On a side note, if it was possible to store the data in tuples initially that would help with the speed since there would be no need to convert them to tuples. From what I can see this shouldn't be a problem since the value lists are always going to be 3 items long. If you have to change a value in one, just replace the whole tuple.

Upvotes: 3

Bob Dylan
Bob Dylan

Reputation: 1833

I don't have your data so I can't test the performance for myself, but what about a big evil list comprehension? Something like this?

element_to_vertex = {}
for vertex in vertex_coordinates:
    temp = []
    element_to_vertex[vertex] = [elem for elem in element_coordinates if(near(element_coordinates[elem][0], vertex_coordinates[vertex][0])) and if(near(element_coordinates[elem][1], vertex_coordinates[vertex][1])) and if(near(element_coordinates[elem][2], vertex_coordinates[vertex][2]))]

You probably won't notice a huge improvement in speed, but perhaps some since it doesn't have to look up the append() method every time. For better performance, consider dropping into C.

Upvotes: 0

Related Questions