GeForce
GeForce

Reputation: 21

How would I detect/find duplicate values in a .JSON file via Python dictionaries

I'm sort of new to Python and I am trying to figure out how to find all duplicates within a JSON file. So far I've created this python script to open and read the JSON file and parse the JSON report. I need to figure out a way to find all potential duplicate transactions and to print each line to contain the date, amount, description, and transactionID. Please let me know if I am on the correct path, any suggestions or pointers would help.

from asyncio.base_tasks import _task_print_stack
import json

#Opens the Formatted JSON File
file_handle = open("42525022_formatted-1.json", "r")
contents = file_handle.read()
#Parses the JSON file - categories report, items, accounts and transactions.
parsed = json.loads(contents)
transactions = parsed["report"]["items"][0]["accounts"][0]["transactions"]

transactions_by_date ={}

for txn in transactions:
    date = txn["date"]
    description = txn["original_description"]
    if date not in transactions_by_date:
        transactions_by_date[date] = []
    transactions_by_date[date].append(
        {
            "amount": txn["amount"],
            "description": txn["original_description"],
            "transaction_id": txn["transaction_id"]
        }
    )    
#Ignored    
#print(txn["date"] + "\n"  + str(txn["amount"]))
#print(transactions_by_date)

for date in transactions_by_date:
    transactions = transactions_by_date[date]
    print(transactions)
    break

#Objective
#Print all duplicates within a calendar date should have date, amount, description and transactionID 

Example JSON File Contents

                "account_id": "zbbbZEdzo4iZbed98AbzHeqr3VX0NztOBQgZe",
                "amount": 0,
                "date": "2022-07-02",
                "iso_currency_code": "USD",
                "original_description": "GOOGLE *ADS598329",
                "pending": true,
                "transaction_id": "1XXX9XbVRKHj8eN66",
                "unofficial_currency_code": null
              },

Upvotes: 0

Views: 979

Answers (1)

1extralime
1extralime

Reputation: 626

Would just detecting a duplicate ID be sufficient, or is there a chance there are multiple transactions with the same ID, but differing values for the other attributes? I know you asked about achieving this via python dictionary, however an additional tool would help here
I would suggesting using a library like pandas. Then you can think of your data as in a spreadsheet.

import pandas as pd
df = pd.DataFrame(transactions)
duplicates = df.duplicated()

Check out the documentation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

Upvotes: 1

Related Questions