Terry Chen
Terry Chen

Reputation: 11

Getting keys through values from nested json file using Python

I would like to extract the keys path whose value is Wally from the program

arr = []
sub_arr = []
def extract(obj, sub_arr, val):
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, (dict, list)):    
                sub_arr.append(k)            
                extract(v, sub_arr, val)
            elif v == val:
                sub_arr.append(k)
                arr.append(sub_arr)
                sub_arr = []
    elif isinstance(obj, list):
        for item in obj:
            if isinstance(item, (dict, list)):
                sub_arr.append(obj.index(item))
                extract(item, sub_arr, val)      
            elif item == val:
                sub_arr.append(obj.index(item))
                arr.append(sub_arr)    
                sub_arr = []
    return arr

obj =  {
        "News": [
            {
                "Title": "NewsA",
                "Tags": ["Gossiping"],
                "Date": "2021/06/26",
                "Detail": {
                    "Author": "Wally",
                    "Content": "Hello World"
                }
            },
            {
                "Title": "NewsB",
                "Tags": ["Social", "Wally"],
                "Date": "2021/06/27",
                "Detail": {
                    "Author": "Andy",
                    "Content": "Taiwan NO.1"
                }
            }
        ]
    }
print(extract(obj, sub_arr, "Wally"))

This is the best result I've got so far

[
  ['News', 0, 'Tags', 'Detail', 'Author', 1, 'Tags', 1, 'Detail']
, ['News', 0, 'Tags', 'Detail', 'Author', 1, 'Tags', 1, 'Detail']
]

My desired value would be like this

[['News', 0, 'Detail', 'Author'], ['News', 1, 'Tags', 1]]

Pretty stuck right here. Is there something that I've missed? Would appreciate a little help

Upvotes: 1

Views: 279

Answers (3)

Michiel Aarts
Michiel Aarts

Reputation: 71

Lists are mutable, and get passed on for each iteration of your extract function. In your case, the sub_arr gets infinitely appended, which explains the answer you get. Therefore always be careful when using lists in this manner.

A solution is to create a new list for each function call of extract, for example:

arr = []
sub_arr = []
def extract(obj, sub_arr, val):
    if isinstance(obj, dict):
        for k, v in obj.items():
            found_arr = [*sub_arr, k]
            if isinstance(v, (dict, list)):
                extract(v, found_arr, val)
            elif v == val:
                arr.append(found_arr)
    elif isinstance(obj, list):
        for item in obj:
            found_arr = [*sub_arr, obj.index(item)]
            if isinstance(item, (dict, list)):
                extract(item, found_arr, val)
            elif item == val:
                arr.append(found_arr)
    return arr

obj = {
        "News": [
            {
                "Title": "NewsA",
                "Tags": ["Gossiping"],
                "Date": "2021/06/26",
                "Detail": {
                    "Author": "Wally",
                    "Content": "Hello World"
                }
            },
            {
                "Title": "NewsB",
                "Tags": ["Social", "Wally"],
                "Date": "2021/06/27",
                "Detail": {
                    "Author": "Andy",
                    "Content": "Taiwan NO.1"
                }
            }
        ]
    }
print(extract(obj, sub_arr, "Wally"))

Which yields the desired answer:

[['News', 0, 'Detail', 'Author'], ['News', 1, 'Tags', 1]]

Upvotes: 1

ssblack
ssblack

Reputation: 21

In your code

if isinstance(v, (dict, list)):    
     arr.append(k) 

you appended k every time you got new dict or list then result is wrong. Furthermore, pass sub_arr through many layers made your code harder to be controlled.

I have changes some logic folling your code base.

def extract(obj, val):
    result = []
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, str):
                if v == val:
                    result.append([k])
                    continue
            exts = extract(v, val)
            for e in exts:
                result.append([k]+e)
    if isinstance(obj, list):
        count = 0
        for item in obj:
            if isinstance(item, str):
                if item == val:
                    result.append([count])
                    continue
            exts = extract(item, val)
            for e in exts:
                temp  = [count]
                temp.extend(e)
                result.append(temp)
            count+=1
    return result

Upvotes: 0

Hans Musgrave
Hans Musgrave

Reputation: 7111

The biggest flaw is re-use of sub_arr. Any recursive mechanism for structuring this program is going to have to intern or copy partial key paths somehow as it descends. By sharing the same object you'll have every path part strewn about the same result (as you see in your current solution where both key paths are identical).

In programs like this I also tend to like creating different values for my inputs as early as possible rather than branching the behavior of the program based on input type. I find it easier to understand and modify. In particular, notice how a top-level feature of your solution is branching on isinstance(obj, ...), whereas in my solution we can turn both lists and dicts into some kind of iterable of pairs of things for k, v in items(obj).

def items(obj):
  if isinstance(obj, dict):
    return obj.items()
  if isinstance(obj, list):
    return enumerate(obj)

def _extract(obj, target):
  # base case
  if not items(obj):
    if obj == target:
      yield []

  # recurse
  else:
    for k, v in items(obj):
      for L in _extract(v, target):
        # could use `yield [k, L]` here and
        # list(map(unpack, _extract(obj, target))) below
        # to avoid quadratic copying for long key paths
        yield [k, *L]

def extract(obj, target):
  return list(_extract(obj, target))

# if you opt for `yield [k, L]` solution
def unpack(L):
  result = []
  while L:
    k, L = L
    result.append(k)
  return result

As something of an aside, this can be a useful idea when exploring unknown API responses and whatnot. Restructure it to take a matching function predicate rather than just some value you're looking for and it becomes a lot more general purpose.

def _extract(obj, matches):
  if not items(obj):
    if matches(obj):
      yield []
  ...

# existing behavior
extract(obj, lambda x: x=='Wally')

# just a utility function
def false_on_error(f):
  def _f(*a, **k):
    try:
      return f(*a, **k)
    except:
      return False
  return _f

# find the keys giving values that kind of look like an email
import re
prog = re.compile(r'@.*?\.')
extract(obj, false_on_error(lambda x: prog.search(x)))

# you saw a value on the site and want to know if anything
# kind of looks like it
#
# bad practice to depend on `or` short-circuiting like this
# to prevent errors...
extract(obj, false_on_error(lambda x: x == 98.6 or '98.6' in x))

Upvotes: 0

Related Questions