Nataly Firstova
Nataly Firstova

Reputation: 821

Remove duplicate values from list of nested dictionaries

I have list of dictionaries with nested structure. I need to remove all duplicate values. I'm newbie in Python and can't solve this task. Anyone can help me?

My list looks like:

[  
   {  
      "task_id":123,
      "results":[  
         {  
            "url":"site.com",
            "date":"04.18.2019"
         },
         {  
            "url":"another_site.com",
            "date":"04.18.2019"
         },
         {  
            "url":"site1.com",
            "date":"04.18.2019"
         }
      ]
   },
   {  
      "task_id":456,
      "results":[  
         {  
            "url":"site3.com",
            "date":"04.18.2019"
         },
         {  
            "url":"site.com",
            "date":"04.18.2019"
         }
      ]
   },
   {  
      "task_id":789,
      "results":[  
         {  
            "url":"site7.com",
            "date":"04.18.2019"
         },
         {  
            "url":"site9.com",
            "date":"04.18.2019"
         },
         {  
            "url":"site.com",
            "date":"04.18.2019"
         }
      ]
   }
]

I need to set site.com only once. If any value of url is duplicated - exclude it from dict.

As result: task 123 with 3 dicts in results task 456 with 1 dict in results (exclude site.com) task 789 with 2 dict in results (exclude site.com)

Desired output should looks like:

[  
   {  
      "task_id":123,
      "results":[  
         {  
            "url":"site.com",
            "date":"04.18.2019"
         },
         {  
            "url":"another_site.com",
            "date":"04.18.2019"
         },
         {  
            "url":"site1.com",
            "date":"04.18.2019"
         }
      ]
   },
   {  
      "task_id":456,
      "results":[  
         {  
            "url":"site3.com",
            "date":"04.18.2019"
         }
      ]
   },
   {  
      "task_id":789,
      "results":[  
         {  
            "url":"site7.com",
            "date":"04.18.2019"
         },
         {  
            "url":"site9.com",
            "date":"04.18.2019"
         }
      ]
   }
]

Upvotes: 1

Views: 605

Answers (3)

Ajax1234
Ajax1234

Reputation: 71451

You can use a list comprehension:

d = [{'task_id': 123, 'results': [{'url': 'site.com', 'date': '04.18.2019'}, {'url': 'another_site.com', 'date': '04.18.2019'}, {'url': 'site1.com', 'date': '04.18.2019'}]}, {'task_id': 456, 'results': [{'url': 'site3.com', 'date': '04.18.2019'}, {'url': 'site.com', 'date': '04.18.2019'}]}, {'task_id': 789, 'results': [{'url': 'site7.com', 'date': '04.18.2019'}, {'url': 'site9.com', 'date': '04.18.2019'}, {'url': 'site.com', 'date': '04.18.2019'}]}]
new_d = [{**a, 'results':[c for c in a['results'] if all(c not in b['results'] for b in d[:i])]} for i, a in enumerate(d)]

Output:

[
  {
    "task_id": 123,
    "results": [
        {
            "url": "site.com",
            "date": "04.18.2019"
        },
        {
            "url": "another_site.com",
            "date": "04.18.2019"
        },
        {
            "url": "site1.com",
            "date": "04.18.2019"
        }
    ]
},
{
    "task_id": 456,
    "results": [
        {
            "url": "site3.com",
            "date": "04.18.2019"
        }
    ]
},
{
    "task_id": 789,
    "results": [
        {
            "url": "site7.com",
            "date": "04.18.2019"
        },
        {
            "url": "site9.com",
            "date": "04.18.2019"
        }
     ]
   }
]

Upvotes: 0

AKASH
AKASH

Reputation: 1

 people = {
          1: {'name': 'John',},
              2: {'name': 'Marie'},
          3: {'name': 'Ann',},
          4: {'name': 'John'},
     }
print(people)
unique = {}
for key, value in people.items(): 
       if value not in unique.values(): 
          unique[key] = value
print(unique)

try these

Upvotes: -1

FindOutIslamNow
FindOutIslamNow

Reputation: 1236

let results to be your array.

u = set()
final = []
for dict in results:
   for res in dict["results"]:
      if res["url"] not in u:
         u.add(res["url"])
         final.append(res)
print(final)

Upvotes: 3

Related Questions