NAS
NAS

Reputation: 90

Python: Iterate JSON and remove items with specific criteria

I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.

The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list. Here is a snip of the JSON file:

{
  "result": [
    {
      "number": "Number1",
      "short_description": "Some Description",
      "assignment_group": {
        "display_value": "Some value",
        "link": "https://some_link.com"
      },
      "incident_state": "Closed",
      "sys_created_on": "2020-03-30 11:51:24",
      "priority": "4 - Low",
      "assigned_to": {
        "display_value": "John Doe",
        "link": "https://some_link.com"
      }
    },
    {
      "number": "Number2",
      "short_description": "Some Description",
      "assignment_group": {
        "display_value": "Some value",
        "link": "https://some_link.com"
      },
      "incident_state": "Closed",
      "sys_created_on": "2020-03-10 11:07:13",
      "priority": "4 - Low",
      "assigned_to": {
        "display_value": "Tyrell Greenley",
        "link": "https://some_link.com"
      }
    },
    {
      "number": "Number3",
      "short_description": "Some Description",
      "assignment_group": {
        "display_value": "Some value",
        "link": "https://some_link.com"
      },
      "incident_state": "Closed",
      "sys_created_on": "2020-03-20 10:23:35",
      "priority": "4 - Low",
      "assigned_to": {
        "display_value": "Delmar Vachon",
        "link": "https://some_link.com"
      }
    },
    {
      "number": "Number4",
      "short_description": "Some Description",
      "assignment_group": {
        "display_value": "Some value",
        "link": "https://some_link.com"
      },
      "incident_state": "Closed",
      "sys_created_on": "2020-03-30 11:51:24",
      "priority": "4 - Low",
      "assigned_to": {
        "display_value": "Samual Isham",
        "link": "https://some_link.com"
      }
    }
  ]
}

Here is the Python code:

users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']

# Load JSON file
with open('extract.json', 'r') as input_file:
    input_data = json.load(input_file)


# Create a function to clear the data
def clear_data(data, users):
    """Filter out the data and leave only records for the names in the users_test list"""
    for elem in data:
        print(elem['assigned_to']['display_value'] not in users)
        if elem['assigned_to']['display_value'] not in users:
            print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
            data.remove(elem)
        else:
            print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))

    return data


cd = clear_data(input_data['result'], users_test)

And here is the output, which seems to iterate through only 2 of the items in the file:

True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.

Process finished with exit code 0

It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.

Here is the output of the iteration without applying the remove() method:

True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.

Process finished with exit code 0

Note: I have left the check for the name visible on purpose.

I would appreciate any ideas to sort out the situation.

Upvotes: 0

Views: 1704

Answers (3)

sahasrara62
sahasrara62

Reputation: 11238

users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']

solution = []

for user in users_test:
    print(user)
    for value in data['result']:
        if user == value['assigned_to']['display_value']:
            solution.append(value)
print(solution)

for more efficient code, as asked by @NomadMonad

solution = list(filter(lambda x:  x['assigned_to']['display_value'] in users_test, data['result']))

Upvotes: 2

Josh Clark
Josh Clark

Reputation: 1012

You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.

A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:

import copy

def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""

    for elem in copy.deepcopy(data):  # deepcopy handles nested dicts
        # Still call data.remove() in here

Upvotes: 1

NomadMonad
NomadMonad

Reputation: 649

If you don't need to log info about people you are removing you could simply try

filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]


Upvotes: 3

Related Questions