Reputation: 25
I am having a requirement where I need to fetch a specific column from a excel file, compare the exact string against a list of dictionaries and replace it if a match is found.
E.g The sample excel sheet looks like this (test.xlsx with one sheet/tab as 'tab1'):-
col1 col2 col3
A1 1.1.1.1:1111 1.1.1.1:1111,2.2.2.2:1111,3.3.3.3:1111,4.4.4.4:1111
B2 2.2.2.2:2222 2.2.2.2:2222,5.5.5.5:5555,6.6.6.6:6666
and so on
My current dictionary looks like this:-
test_dict = [{'col1': 'A1', 'K2': ['H1', 'H2'], 'K3': ['1.1.1.1:1111','0.0.0.0:0000']},{'col1': 'B2', 'K2': ['H4', 'H5'], 'K3': ['2.2.2.2:2222','8.8.8.8:8888']}]
and so on
So, what I would require is take col2 from excel, compare it against the test_dict dictionary, if match is found, replace it with col3
My expected dictionary output would need to be:-
test_dict = [{'col1': 'A1', 'K2': ['H1', 'H2'], 'K3': ['1.1.1.1:1111','2.2.2.2:1111','3.3.3.3:1111','4.4.4.4:1111','0.0.0.0:0000']},{'col1': 'B2', 'K2': ['H4', 'H5'], 'K3': ['2.2.2.2:2222','5.5.5.5:5555','6.6.6.6:6666','8.8.8.8:8888']}]
I am new to python and needed some advice as how to achieve it. I have tried to convert the excel in another dictionary & then compare it with existing dictionary and replace it but couldn't achieve the result. Kindly help and suggest.
Upvotes: 0
Views: 261
Reputation: 820
I have put some example code below which may work for you depending on the type and shape of your input excel data.
test_list = [{'col1': 'A1',
'K2': ['H1', 'H2'],
'K3': ['1.1.1.1:1111','0.0.0.0:0000']},
{'col1': 'B2',
'K2': ['H4', 'H5'],
'K3': ['2.2.2.2:2222','8.8.8.8:8888']}
]
example_data = [['A1', '1.1.1.1:1111', '1.1.1.1:1111,2.2.2.2:1111,3.3.3.3:1111,4.4.4.4:1111'],
['B2', '2.2.2.2:2222', '2.2.2.2:2222,5.5.5.5:5555,6.6.6.6:6666']
]
def compare_K3(test_list, col2_data, col3_data):
"""This function assumes that relevant data is always in K3"""
for i, row_dict in enumerate(test_list): # i will be list index, row_dict will be col_dict
data_list = row_dict.get('K3', []) # empty list if 'K3' is not in the dict
for ref_item in data_list: # will be '1.1.1.1:1111' or similar
if col2_data == ref_item:
k3_data = test_list[i]['K3']
col3_data_items = col3_data.split(',')
for new_item in col3_data_items:
if new_item not in k3_data:
k3_data.append(new_item)
# the sorting line below is optional, added to improve readability
# if order is unimportant, test_list[i]['K3'] will be updated without this line since
# k3_data and test_list[i]['K3'] are the same object
test_list[i]['K3'] = sorted(k3_data)
# change return to break if there are multiple instances of col2 data to change
return
def compare_general(test_list, col2_data, col3_data):
"""This function does not make any assumptions about the relevant data"""
for i, row_dict in enumerate(test_list): # i will be list index, row_dict will be col_dict
for col_key, data_list_or_str in row_dict.items():
if isinstance(data_list_or_str, list):
data_list = data_list_or_str
for ref_item in data_list: # will be '1.1.1.1:1111' or similar
if col2_data == ref_item:
test_list[i][col_key] = col3_data
# change return to break if there are multiple instances
# of col2 data to change
return
else: # not a list
data_str = data_list_or_str
if col2_data == data_str:
test_list[i][col_key] = col3_data
# change return to break if there are multiple instances of
# col2 data to change
return
def sheet_processor(test_list, sheet_data):
"""Sheet data is a two dimensional list of rows and sublists of columns"""
for row in sheet_data:
col2_data = row[1]
col3_data = row[2]
compare_K3(test_list, col2_data, col3_data)
I have renamed your test_dict to test_list as it is actually a list of dictionaries.
It wasn't clear from your question how you are getting the data from Excel to Python, so for simplicity I made it a two dimensional list of rows and columns. You can modify function sheet processor if it is a dictionary or something else.
I have also taken the assumption that the relevant compare data is in 'K3'. If it may be in another entry, we can loop through each entry, though you can use compare_general, though it is slightly long-winded.
Finally, if there are multiple col3 entries which need to be changed from a single data input, change the 'return' statement to 'break'. I think this should continue looping.
To run the example:
In [1]: so.test_list
Out[1]:
[{'col1': 'A1', 'K2': ['H1', 'H2'], 'K3': ['1.1.1.1:1111', '0.0.0.0:0000']},
{'col1': 'B2', 'K2': ['H4', 'H5'], 'K3': ['2.2.2.2:2222', '8.8.8.8:8888']}]
In [2]: sheet_processor(so.test_list, so.example_data)
In [3]: test_list
Out[3]:
[{'col1': 'A1',
'K2': ['H1', 'H2'],
'K3': '1.1.1.1:1111,2.2.2.2:1111,3.3.3.3:1111,4.4.4.4:1111'},
{'col1': 'B2',
'K2': ['H4', 'H5'],
'K3': '2.2.2.2:2222,5.5.5.5:5555,6.6.6.6:6666'}]
Upvotes: 1