I'mvinay
I'mvinay

Reputation: 1

Python Compare Dictionaries with similar and exact keys

I have a scenario where to compare two dictionaries based on the set of keys. i.e

TmpDict ={}
TmpDict2={}
for line in reader:
    line = line.strip()
    TmpArr=line.split('|')
    TmpDict[TmpArr[2],TmpArr[3],TmpArr[11],TmpArr[12],TmpArr[13],TmpArr[14]]=line
for line in reader2:
    line = line.strip()
    TmpArr=line.split('|')
    TmpDict2[TmpArr[2],TmpArr[3],TmpArr[11],TmpArr[12],TmpArr[13],TmpArr[14]]=line

This works fine comparing two dictionaries with exactly same keys but there is a tolerance ,i need to consider.That is .. TmpArr[12],TmpArr[14] are time and duration where tolerance need to be checked.Please see the example below

Example:

dict1={(111,12,23,12:22:30,12:23:34,64):     4|1994773966623|773966623|754146741|\N|359074037474030|413025600032728|}
dict2={(111,12,23,12:22:34,12:23:34,60) :4|1994773966623|773966623|754146741|\N|359074037474030|413025600032728|} 

Lets assume i have two dictionaries with length 1 each and tolerance is '4'seconds so the above keys must be considered as Matched lines even though there is a difference in the time and duration of 4 seconds. I know Dictionaries search for a key is o(1) irrespective of the length ,how could i achieve this scenario with same performance. Thanks

Upvotes: 0

Views: 159

Answers (2)

You have at least these 4 options:

  1. store all keys within the tolerance (consumes memory).

  2. Look up keys with tolerance. Notice that if the tolerance is defined and constant, then lookups are C * O(N) which is O(n).

  3. combine the previous ones: compress the keys using some scheme, say round down to divisible to 4, and up to divisible by 4, and then store the value for these keys in the dictionary, and verify from the exact value if it is correct.

  4. or do not use a dictionary but instead some kind of tree structure; notice that you can still store the exact part of the key in a dictionary.

As such, you do not provide enough information to decide whichever is best of these. However I personally would go for 3.

Upvotes: 1

Aliakbar Abbasi
Aliakbar Abbasi

Reputation: 249

If you can use more memory to keep performance as above code, you can insert more than one entry for each element. For example "111,12,23,12:22:30,12:23:34,60", "111,12,23,12:22:30,12:23:34,61", "... , "111,12,23,12:22:30,12:23:34,68" are inserted for just ""111,12,23,12:22:30,12:23:34,64" key. If you want to not waste memory but o(1) performance is kept, You can check 8 keys (4 before and 4 after) for one key. It has 8 times more comparison than above code but is o(1) also.

Upvotes: 0

Related Questions