Amjasd Masdhash
Amjasd Masdhash

Reputation: 178

filter a list of dictionary based on two keys

with open('test.csv') as f:
    list_of_dicts = [{k:v for k, v in row.items()} for row in csv.DictReader(f, skipinitialspace=True)]

Hello,I have csv file which I make to a list of dictionaries,I want to filter its output on ASIN (remove duplicate if found)based on "Merchant 1 Price" I want to get the lower price, not all of them have duplicates i.e remove duplicates (keep the one with the lowest merchant 1 price),and keep the non duplicates (in a new list), here is a sample of list

{'Product Name': 'NFL Buffalo Bills Bedding Set, Twin', 'Amazon Price': '84.99', 'ASIN': 'B004B3M5UU', 'Merchant_1': 'Homedepot', 'Merchant_1_Price': '72.65', 'Merchant_1_Stock': 'False', 'Merchant_1_Link': 'https://www.homedepot.com/p/Jaguars-2-PIECE-Draft-Multi-Twin-Comforter-Set-1NFL862000014RET/303181069', 'Amazon Image': '=IMAGE("{temp}",4,100,100)', 'Merchant_1_Image': '=IMAGE("{temp}",4,100,100)'}
{'Product Name': 'NFL Buffalo Bills Bedding Set, Twin', 'Amazon Price': '84.99', 'ASIN': 'B004B3M5UU', 'Merchant_1': 'Overstock', 'Merchant_1_Price': '61.64', 'Merchant_1_Stock': 'False', 'Merchant_1_Link': 'https://www.overstock.com/Bedding-Bath/The-Northwest-Company-NFL-Buffalo-Bills-Draft-Twin-2-piece-Comforter-Set/13330480/product.html', 'Amazon Image': '=IMAGE("{temp}",4,100,100)', 'Merchant_1_Image': '=IMAGE("{temp}",4,100,100)'}
{'Product Name': 'EGO Power+ HT2400 24-Inch 56-Volt Lithium-ion Cordless Hedge Trimmer - Battery and Charger Not Included', 'Amazon Price': '129.0', 'ASIN': 'B00N0A4S1O', 'Merchant_1': 'Homedepot', 'Merchant_1_Price': '129.00', 'Merchant_1_Stock': 'True', 'Merchant_1_Link': 'https://www.homedepot.com/p/EGO-24-in-56-Volt-Lithium-Ion-Cordless-Hedge-Trimmer-Battery-and-Charger-Not-Included-HT2400/205163108', 'Amazon Image': '=IMAGE("{temp}",4,100,100)', 'Merchant_1_Image': '=IMAGE("{temp}",4,100,100)'}

I tried plenty of two for loops but I can't seem to find the correct code logic.

Any help is appreciated

Upvotes: 0

Views: 86

Answers (1)

Blckknght
Blckknght

Reputation: 104722

The easiest way to deduplicate your list of dicts is to build a dictionary keyed by the unique field, which in this case is 'ASIN'. When you find a duplicate, you can select the one with the lower 'Merchant_1_Price' field:

by_asin = {}
for item in list_of_dicts:
    asin = item['ASIN']
    if (
        asin not in by_asin or
        float(item['Merchant_1_Price']) < float(by_asin[asin]['Merchant_1_Price'])
    ):
        by_asin[asin] = item

deduplicated_list_of_dicts = list(by_asin.values())

In the loop, we're first extracting the asin from the current item since we're going to use it several times. Then we check if that ASIN is either not yet in the by_asin dictionary, or if it is in there, we check if the price on the new item is lower than the price of the old item. In either of those cases, we put the new item into the by_asin dictionary (replacing the previous value, if there was one).

Upvotes: 1

Related Questions