amanb
amanb

Reputation: 5473

Large JSON file: string search in dict values

I have a large JSON file with about 30,000 items which has items like:

d = {"1102344": "Install 3245 xxx", "23456": "Install 7896 zzz", "90887": "Install 6655 ddd"}

I've been trying to get the key, value for items that match num and query as in the example code below:

    def test(num, query):
        l = [(k,v) for k,v in d.items() if num in v and v.strip().startswith(query)]
        return l
    test('3245','Install')
#Output: [('1102344', 'Install 3245 xxx')]

The above code works as the dict d has few items. However, when I run this for my dataset, I get the following error:

argument of type `bool` is not iterable

I've searched around for help on SO and many answers point towards using ijson but I'm restricted to install third-party libraries. Is there a memory-efficient alternative way to search for substrings in the dict values? I've run out of options trying different things and not sure why I'm getting this error. The JSON is ordered and is valid.

Just to let you know, it used to work before but now I get this error 3 times out of 5. So the error appears intermittently but quite frequent for the app to do its job. There has been no change in the JSON file or the code whatsoever, however, the JSON file size has increased which makes me think that could be the reason.

Upvotes: 1

Views: 535

Answers (1)

Filip Młynarski
Filip Młynarski

Reputation: 3612

Size of json isn't problem here, problem is that probably some of values of your dict are bools (True of False) and so you can not treat them like strings by using strip() or startswith(). Here I added '12345': False entry yo our dict to induce this error.

d = {"1102344": "Install 3245 xxx", '12345': False, "23456": "Install 7896 zzz", "90887": "Install 6655 ddd"}

def test(num, query):
    l = [(k,v) for k,v in d.items() if type(v) is not bool and num in v and v.strip().startswith(query)]
    return l
print(test('3245','Install'))

Output:

TypeError: argument of type 'bool' is not iterable

What you could do is in your function first check whether value type isn't bool, or better if its string.

l = [(k,v) for k,v in d.items() if type(v) is str and num in v and v.strip().startswith(query)]

Output:

[('1102344', 'Install 3245 xxx')]

Upvotes: 1

Related Questions