Compare and replace values from excel columns into a list of dictionaries in python

Question

I am having a requirement where I need to fetch a specific column from a excel file, compare the exact string against a list of dictionaries and replace it if a match is found.

E.g The sample excel sheet looks like this (test.xlsx with one sheet/tab as 'tab1'):-

col1 col2            col3

A1   1.1.1.1:1111    1.1.1.1:1111,2.2.2.2:1111,3.3.3.3:1111,4.4.4.4:1111
B2   2.2.2.2:2222    2.2.2.2:2222,5.5.5.5:5555,6.6.6.6:6666

and so on

My current dictionary looks like this:-

test_dict = [{'col1': 'A1', 'K2': ['H1', 'H2'], 'K3': ['1.1.1.1:1111','0.0.0.0:0000']},{'col1': 'B2', 'K2': ['H4', 'H5'], 'K3': ['2.2.2.2:2222','8.8.8.8:8888']}]

and so on

So, what I would require is take col2 from excel, compare it against the test_dict dictionary, if match is found, replace it with col3

My expected dictionary output would need to be:-

test_dict = [{'col1': 'A1', 'K2': ['H1', 'H2'], 'K3': ['1.1.1.1:1111','2.2.2.2:1111','3.3.3.3:1111','4.4.4.4:1111','0.0.0.0:0000']},{'col1': 'B2', 'K2': ['H4', 'H5'], 'K3': ['2.2.2.2:2222','5.5.5.5:5555','6.6.6.6:6666','8.8.8.8:8888']}]

I am new to python and needed some advice as how to achieve it. I have tried to convert the excel in another dictionary & then compare it with existing dictionary and replace it but couldn't achieve the result. Kindly help and suggest.

VectorVictor · Accepted Answer

I have put some example code below which may work for you depending on the type and shape of your input excel data.

test_list = [{'col1': 'A1',
          'K2': ['H1', 'H2'],
          'K3': ['1.1.1.1:1111','0.0.0.0:0000']},
          {'col1': 'B2',
           'K2': ['H4', 'H5'],
           'K3': ['2.2.2.2:2222','8.8.8.8:8888']}
           ]

example_data = [['A1', '1.1.1.1:1111', '1.1.1.1:1111,2.2.2.2:1111,3.3.3.3:1111,4.4.4.4:1111'],
                ['B2',  '2.2.2.2:2222', '2.2.2.2:2222,5.5.5.5:5555,6.6.6.6:6666']
                ]

def compare_K3(test_list, col2_data, col3_data):
    """This function assumes that relevant data is always in K3"""
    for i, row_dict in enumerate(test_list):  # i will be list index, row_dict will be col_dict
        data_list = row_dict.get('K3', [])  # empty list if 'K3' is not in the dict
        for ref_item in data_list:  # will be '1.1.1.1:1111' or similar
            if col2_data == ref_item:
                k3_data = test_list[i]['K3']
                col3_data_items = col3_data.split(',')
                for new_item in col3_data_items:
                    if new_item not in k3_data:
                        k3_data.append(new_item)
                # the sorting line below is optional, added to improve readability
                # if order is unimportant, test_list[i]['K3'] will be updated without this line since
                # k3_data and test_list[i]['K3'] are the same object
                test_list[i]['K3'] = sorted(k3_data)
                # change return to break if there are multiple instances of col2 data to change
                return

def compare_general(test_list, col2_data, col3_data):
    """This function does not make any assumptions about the relevant data"""
    for i, row_dict in enumerate(test_list):  # i will be list index, row_dict will be col_dict
        for col_key, data_list_or_str in row_dict.items(): 
            if isinstance(data_list_or_str, list):
                data_list = data_list_or_str
                for ref_item in data_list:  # will be '1.1.1.1:1111' or similar
                    if col2_data == ref_item:
                        test_list[i][col_key] = col3_data
                        # change return to break if there are multiple instances
                        # of col2 data to change
                        return
            else:  # not a list
                data_str = data_list_or_str
                if col2_data == data_str:
                    test_list[i][col_key] = col3_data
                    # change return to break if there are multiple instances of 
                    # col2 data to change
                    return

def sheet_processor(test_list, sheet_data):
    """Sheet data is a two dimensional list of rows and sublists of columns"""
    for row in sheet_data:
        col2_data = row[1]
        col3_data = row[2]
        compare_K3(test_list, col2_data, col3_data)

I have renamed your test_dict to test_list as it is actually a list of dictionaries.

It wasn't clear from your question how you are getting the data from Excel to Python, so for simplicity I made it a two dimensional list of rows and columns. You can modify function sheet processor if it is a dictionary or something else.

I have also taken the assumption that the relevant compare data is in 'K3'. If it may be in another entry, we can loop through each entry, though you can use compare_general, though it is slightly long-winded.

Finally, if there are multiple col3 entries which need to be changed from a single data input, change the 'return' statement to 'break'. I think this should continue looping.

To run the example:

In [1]: so.test_list
Out[1]:
[{'col1': 'A1', 'K2': ['H1', 'H2'], 'K3': ['1.1.1.1:1111', '0.0.0.0:0000']},
 {'col1': 'B2', 'K2': ['H4', 'H5'], 'K3': ['2.2.2.2:2222', '8.8.8.8:8888']}]

In [2]: sheet_processor(so.test_list, so.example_data)

In [3]: test_list
Out[3]:
[{'col1': 'A1',
  'K2': ['H1', 'H2'],
  'K3': '1.1.1.1:1111,2.2.2.2:1111,3.3.3.3:1111,4.4.4.4:1111'},
 {'col1': 'B2',
  'K2': ['H4', 'H5'],
  'K3': '2.2.2.2:2222,5.5.5.5:5555,6.6.6.6:6666'}]

Compare and replace values from excel columns into a list of dictionaries in python

Answers (1)

Related Questions