Python: building complex nested lists within a dictionary

Question

I am looking at building lists of lists within a dictionary from an Excel spreadsheet.

My spreadsheet looks like this:

source_item_id	target_item_id	find_sting	replace_sting
source_id1	target_id1	abcd1	efgh1
source_id1	target_id1	ijkl1	mnop1
source_id1	target_id2	abcd2	efgh2
source_id1	target_id2	ijkl2	mnop2
source_id2	target_id3	qrst	uvwx
source_id2	target_id3	yzab	cdef
source_id2	target_id4	ghij	klmn
source_id2	target_id4	opqr	stuv

My output dictionary should looks like this:

{
  "source_id1": [{
      "target_id1": [{
          "find_string": "abcd1",
          "replace_string": "efgh1"
      },
      {
          "find_string": "ijkl1",
          "replace_string": "mnop1"
      }]
  },
  {
      "target_id2": [{
          "find_string": "abcd2",
          "replace_string": "efgh2"
      },
      {
          "find_string": "ijkl2",
          "replace_string": "mnop2"
      }]
  }],
  "source_id2": [{
      "target_id3": [{
          "find_string": "qrst",
          "replace_string": "uvwx"
      },
      {
          "find_string": "yzab",
          "replace_string": "cdef"
      }]
  },
  {
      "target_id4": [{
          "find_string": "ghij",
          "replace_string": "klmn"
      },
      {
          "find_string": "opqr",
          "replace_string": "stuv"
      }]
  }]
}

With the following code I only get the last values in each of the lists:

import xlrd
xls_path = r"C:\data\ItemContent.xlsx"
book = xlrd.open_workbook(xls_path)
sheet_find_replace = book.sheet_by_index(1)
find_replace_dict = dict() 
for line in range(1, sheet_find_replace.nrows):
    source_item_id = sheet_find_replace.cell(line, 0).value
    target_item_id = sheet_find_replace.cell(line, 1).value
    find_string = sheet_find_replace.cell(line, 2).value
    replace_sting = sheet_find_replace.cell(line, 3).value
    find_replace_list = [{"find_string": find_string, "replace_sting": replace_sting}]
    find_replace_dict[source_item_id] = [target_item_id]
    find_replace_dict[source_item_id].append(find_replace_list)
print(find_replace_dict)

--> result

{
    "source_id1": ["target_id2", [{
        "find_string": "ijkl2",
        "replace_sting": "mnop2"
      }
    ]],
    "source_id2": ["target_id4", [{
        "find_string": "opqr",
        "replace_sting": "stuv"
      }
    ]]
}

Nathaniel Ford · Accepted Answer

Your problem is rather complicated by the fact that you have a list of single-key dictionaries as the value of your source ids, but you can follow a pattern of parsing each line for the relevant items and, and then using those to target where you insert appends, or alternatively create new lists:

def process_line(line) -> Tuple[str, str, dict]:
    source_item_id = sheet_find_replace.cell(line, 0).value
    target_item_id = sheet_find_replace.cell(line, 1).value
    find_string = sheet_find_replace.cell(line, 2).value
    replace_string = sheet_find_replace.cell(line, 3).value
    return source_item_id, target_item_id, {
        "find_string": find_string,
        "replace_string": replace_string
    }

def find_target(target: str, ls: List[dict]) -> int:
    # Find the index of the target id in the list
    for i in len(ls):
        if ls[i].get(target):
            return i
    return -1  # Or some other marker

import xlrd
xls_path = r"C:\data\ItemContent.xlsx"
book = xlrd.open_workbook(xls_path)
sheet_find_replace = book.sheet_by_index(1)
result_dict = dict() 
for line in range(1, sheet_find_replace.nrows):
    source, target, replacer = process_line(line)
    # You can check here that the above three are correct
    source_list = result_dict.get(source, [])  # Leverage the default value of the get function
    target_idx = find_target(target, source_list)
    target_dict = source_list[target_idx] if target_idx >=0 else {}
    replace_list = target_dict.get(target, [])
    replace_list.append(replacer)
    
    target_dict[target] = replace_list
    if target_idx >= 0:
        source_list[target_idx] = target_dict
    else:
        source_list.append(target_dict)

    result_dict[source] = source_list

print(result_dict)

I would note that if source_id pointed to a dictionary rather than a list, this could be radically simplified, since we wouldn't need to search through the list for a potentially already-existing list item and then awkwardly replace or append as needed. If you can change this constraint (remember, you can always convert a dictionary to a list downstream), I might consider doing that.

Python: building complex nested lists within a dictionary

Answers (1)

Related Questions