Amit Bharti
Amit Bharti

Reputation: 1

Operations on items of dictionary object in list, if duplicate found

If dictionary object in list is duplicate based on two or more items , then these two object should be merged by doing arithmetic operation on items.

Example -> 'CUSTOMER' and 'ID' item makes duplicate. 'USAGE' item should be added resulting object.

First 'COUNTRY CODE' should be retained in resulting object, if it differs.

Input :

[
  {
    "CUSTOMER": "XYZ",
    "COUNTRY CODE": "US",
    "ID": "Essential",
    "USAGE": 500
  },
 {
    "CUSTOMER": "XYZ",
    "COUNTRY CODE": "US",
    "ID": "Seats",
    "USAGE": 20
  },
 {
    "CUSTOMER": "XYZ",
    "COUNTRY CODE": "FR",
    "ID": "Essential",
    "USAGE": 50
  }

]

Output :

[
  {
    "CUSTOMER": "XYZ",
    "COUNTRY": "US",
    "ID": "Essential",
    "USAGE": 550
  },
 {
    "CUSTOMER": "XYZ",
    "COUNTRY CODE": "US",
    "ID": "Seats",
    "USAGE": 20
  }
]

Upvotes: 0

Views: 43

Answers (2)

Highland Mark
Highland Mark

Reputation: 1010

Nice little exercise!

Here's my take on it. We'll need to sort the list, then compare pairs:

customers.sort(key=lambda cust:cust['CUSTOMER'] + cust['ID'])
result = []
previous_cust = None
for cust in customers:

    if not previous_cust:   # first time though
        previous_cust = cust
        continue

    if previous_cust['CUSTOMER'] == cust['CUSTOMER'] and\
       previous_cust['ID'] == cust['ID']:
            previous_cust['USAGE'] += cust['USAGE']
    else:
        result.append(previous_cust)
        previous_cust = cust

result.append(previous_cust)  # tidy up
result

gives:

[{'CUSTOMER': 'XYZ', 'COUNTRY CODE': 'US', 'ID': 'Essential', 'USAGE': 550},
 {'CUSTOMER': 'XYZ', 'COUNTRY CODE': 'US', 'ID': 'Seats', 'USAGE': 20}]

Upvotes: 0

jpp
jpp

Reputation: 164733

I recommend you use a 3rd party library such as pandas for this task.

Given list of dictionaries J, you can perform a groupby and then convert to_dict.

import pandas as pd

res = pd.DataFrame(J).groupby(['CUSTOMER', 'ID'])\
                     .agg({'USAGE': 'sum', 'COUNTRY CODE': 'first'}).reset_index()\
                     .to_dict(orient='records')

print(res)

[{'COUNTRY CODE': 'US', 'CUSTOMER': 'XYZ', 'ID': 'Essential', 'USAGE': 550},
 {'COUNTRY CODE': 'US', 'CUSTOMER': 'XYZ', 'ID': 'Seats', 'USAGE': 20}]

You can also use collections.defaultdict with some messy if statements. I think the pandas way is cleaner and more easily adaptable.

Upvotes: 2

Related Questions