Nick
Nick

Reputation: 1444

Remove duplicate values in different Json Lists python

I know that there are a lot of questions about duplicates but I can't find a solution suitable for me.

I have a json structure like this:

    {
    "test": [
        {
            "name2": [
                "Tik",
                "eev",
                "asdv",
                "asdfa",
                "sadf",
                "Nick"
            ]
        },
        {
            "name2": [
                "Tik",
                "eev",
                "123",
                "r45",
                "676",
                "121"
            ]
        }
    ]
}

I want to keep the first value and remove all the other duplicates.

Expected Result

    {
    "test": [
        {
            "name2": [
                "Tik",
                "eev",
                "asdv",
                "asdfa",
                "sadf",
                "Nick"
            ]
        },
        {
            "name2": [
                "123",
                "r45",
                "676",
                "121"
            ]
        }
    ]
  }

I tried using a tmp to check for duplicates but it didn't seem to work. Also I can't find a way to make it json again.

import json
with open('myjson') as access_json:
    read_data = json.load(access_json)

tmp = []
tmp2 = []
def get_synonyms():
    ingredients_access = read_data['test']
    for x in ingredients_access:
        for j in x['name2']:
            tmp.append(j)
            if j in tmp:
                tmp2.append(j)




get_synonyms()
print(len(tmp))
print(len(tmp2))

Upvotes: 1

Views: 117

Answers (3)

Ajax1234
Ajax1234

Reputation: 71461

You can use recursion:

def filter_d(d):
  seen = set()
  def inner(_d):
     if isinstance(_d, dict):
        return {a:inner(b) if isinstance(b, (dict, list)) else b for a, b in _d.items()}
     _r = []
     for i in _d:
       if isinstance(i, (dict, list)):
          _r.append(inner(i))
       elif i not in seen:
          _r.append(i)
          seen.add(i)
     return _r
  return inner(d)

import json
print(json.dumps(filter_d(data), indent=4))

Output:

{
  "test": [
    {
        "name2": [
            "Tik",
            "eev",
            "asdv",
            "asdfa",
            "sadf",
            "Nick"
        ]
    },
    {
        "name2": [
            "123",
            "r45",
            "676",
            "121"
        ]
     }
  ]
}

Upvotes: 2

r.ook
r.ook

Reputation: 13878

Here's a little hackish answer:

d = {'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
              {'name2': ['Tik', 'eev', '123', 'r45', '676', '121']}]}
s = set()
for l in d['test']:
    l['name2'] = [(v, s.add(v))[0] for v in l['name2'] if v not in s]

Output:

{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
          {'name2': ['123', 'r45', '676', '121']}]}

This uses a set to track the unique values, and add unique values to set while returning the value back to the list.

Upvotes: 1

bootica
bootica

Reputation: 771

You are first adding everything to tmp and then to tmp2 because every value was added to tmp before.

I changed the function a little bit to work for your specific test example:

def get_synonyms():
    test_list = []
    ingredients_access = read_data['test']
    used_values =[]
    for x in ingredients_access:
        inner_tmp = []
        for j in x['name2']:
            if j not in used_values:
                inner_tmp.append(j)
                used_values.append(j)
        test_list.append({'name2':inner_tmp})
    return {'test': test_list}


result = get_synonyms()
print(result)

Output:

{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']}, {'name2': ['123', 'r45', '676', '121']}]}

Upvotes: 1

Related Questions