Reputation: 466
I have an API that creates a JSON file, like below:
"tesla_2.0": {
"kind": "Auto",
"tar_path": "/home/scripts/project_2/tesla_2.0.zip",
"version": "2.0",
"yaml_path": "/home/scripts/project_2/test.yaml",
"name": "tesla"
}
Since I'm reading it from a file, I use json.load() that will lose the order of the saved object unless I tell it to load into an OrderedDict().
is there a simple and efficient way to compare the to files ?
def compare_json_files(file_1, file_2):
if not os.path.isfile(file_1):
raise FileNotFoundError("File not found: {}".format(file_1))
if not os.path.isfile(file_2):
raise FileNotFoundError("File not found: {}".format(file_2))
with open(file_1, 'r') as f1:
data_1 = json.loads(f1)
with open(file_2, 'r') as f2:
data_2 = json.loads(f2)
comparison operation
Python version : 3.5.2
Upvotes: 0
Views: 888
Reputation: 841
I do believe you could check every keys and values. You should first check the set of keys are equal in both sides, then key by key comparison will make sense.
assert(data_1.keys() == data_2.keys())
err_log = [['Err log:']]
for k, v in data_1.items():
try:
assert(v == data_2[k])
except:
err_log.append(['Error catched for key=', k, ', data_1 value=', v, ', data_2 value=', data_2[k]])
[print(str(e)) for e in err_log]
Results:
from copy import deepcopy
from time import time
from operator import itemgetter
n = 10000000
v = {"stuff": "here", "and": "there"}
data_1 = {str(k): deepcopy(v) for k in range(0, n)}
data_2 = {str(k): deepcopy(v) for k in range(n-1, -1, -1)}
def get_time(f):
def _(*args, **kwargs):
t_0 = time()
for x in range(10):
f(*args, **kwargs)
return time() - t_0
return _
def with_dict_keys(d):
return d.keys()
def with_sorted_dict_keys(d):
return sorted(d.keys())
@get_time
def order_n_compare(key_func, d, d_):
k_d, k_d_ = key_func(d), key_func(d_)
assert(k_d == k_d_)
for k in k_d:
assert(d[k] == d_[k])
@get_time
def itemgetter_compare(key_func, d, d_):
k_d, k_d_ = key_func(d), key_func(d_)
assert(k_d == k_d_)
assert(itemgetter(*k_d)(d) == itemgetter(*k_d)(d_))
dictionnary.keys()
operation is irrelevant over iterating through all keys in data_1.items()
because it grows order n. So it's not really necessary to optimize it.dict.keys()
is order(log(n)) then the operation time of getting dict.keys()
seems to be order log(n) too.Upvotes: 1