flybonzai
flybonzai

Reputation: 3931

Cleaning characters out of arbitrary nested dictionary(from JSON)

I want to clean a dictionary that is from a json object to remove all the \n and | characters so that I can use the csv DictWriter to write it out as a line in a flat-file for a copy into an AWS Database. I've never used recursion on a dict object before, and I'm struggling to figure out how to effectively move through all levels until they are a single string, and then iterate through a list of items that I want to replace. With my code I'm currently receiving an IndexError saying my string index is out of range. Here is my function:

def purge_items(in_iter, items):
    if isinstance(in_iter, dict):
        for k, v in in_iter:
            if isinstance(v, dict):
                purge_items(k[v], items)
    elif isinstance(in_iter, list):
        for item in items:
            for elem in in_iter:
                try:
                    elem.replace(item[0], item[1])
                except AttributeError:
                    continue
    else:
        try:
            for item in items:
                in_iter.replace(item[0], item[1])
        except AttributeError:
            return

This function is expecting a dictionary (after I figure it out with a dictionary I want to make it more general to accept any mutable) with arbitrary nested length, and then a list of the items you want to replace in the following form ('\n', ' '), where the second entry is what you are replacing it with.

An example of the data I'm working with is below, with newlines included:

{'issuetype': {'avatarId': 22101,
                                      'description': 'A problem found in '
                                                     'production which impairs '
                                                     'or prevents the '
                                                     'functions of the '
                                                     'product.',
                                      'iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype',
                                      'id': '1',
                                      'name': 'Bug',
                                      'self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1',
                                      'subtask': False}}

Upvotes: 0

Views: 84

Answers (1)

JustMe
JustMe

Reputation: 710

Ok, there are plenty of modules in general handling and playing with text, to mention only a few:

  • ast and it's ast.literal_eval()
  • textwrap and it's textwrap.dedent()
  • json

but in Your case simple:

test = """
    {'issuetype': {'avatarId': 22101,
                                      'description': 'A problem found in '
                                                     'production which impairs '
                                                     'or prevents the '
                                                     'functions of the '
                                                     'product.',
                                      'iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype',
                                      'id': '1',
                                      'name': 'Bug',
                                      'self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1',
                                      'subtask': False}
                                      }
    """

print ("".join([obj.strip().replace('|', '') for obj in test.split("\n")]))

output

{'issuetype': {'avatarId': 22101,'description': 'A problem found in ''production which impairs ''or prevents the ''functions of the ''product.','iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype','id': '1','name': 'Bug','self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1','subtask': False}}

should suffice, does it?

Ooops, not quite, double " ' ' " needs to be removed too - corrected version:

test_1 = "".join([obj.strip().replace('|', '') 
                 for obj in test.split("\n")])
test_2 = test_1.replace("''", "")
print (test_2)

output

{'issuetype': {'avatarId': 22101,'description': 'A problem found in production which impairs or prevents the functions of the product.','iconUrl': 'https://instructure.atlassian.net/secure/viewavatar?size=xsmall&avatarId=22101&avatarType=issuetype','id': '1','name': 'Bug','self': 'https://instructure.atlassian.net/rest/api/2/issuetype/1','subtask': False}}

Upvotes: 1

Related Questions