Reputation: 61
I am trying to compare two files and each line is in JSON format. I need to compare each line between two files and should return the difference.Since the file size is too big and I am unable to read and compare each line.Please suggest me some optimised way in doing this.
Upvotes: 6
Views: 28525
Reputation: 77
This seems to be a pretty solid start: https://github.com/ZoomerAnalytics/jsondiff
>>> pip install jsondiff
>>> from jsondiff import diff
>>> diff({'a': 1, 'b': 2}, {'b': 3, 'c': 4}, syntax='symmetric')
{insert: {'c': 4}, 'b': [2, 3], delete: {'a': 1}}
I'm also going to try it out for a current project, I'll try to maintain updates and edits as I go along.
Upvotes: 3
Reputation: 4767
Two possible ways :
Given that you have a large file, you are better off using difflib technique described in point 1.
Edit based on response to my below answer:
After some research, it appears that the best way to deal with large data payloads is to process this payload in a streamed manner. This way we ensure a speedy processing of the data keeping in mind the memory usage and performance of the software in general.
Refer to this link that talks about Streaming JSON data objects using Python. Similarly take a look at ijson - this is an iterator based JSON parsing/processing library in python.
Hopefully, this helps you towards identifying a good fit library that will solve your use case
Upvotes: 3