Reputation: 163
I'm trying in here to save the cell values from a specific row as keys and cell values from another row as values, all together in a dictionary.
Using the code below, I managed to save first row(green marked) as keys in the dictionary.
But what I'm struggling with, is to save the second row(yellow marked) as values in the dictionary.
dictData = {}
#get row 2
for row1 in ws1.iter_rows(min_row=2, max_row=2, min_col=3, max_col=ws1.max_column):
for cell1 in row1:
#get row 5
for row2 in ws1.iter_rows(min_row=5, max_row=row_data, min_col=5, max_col=ws1.max_column):
for cell2 in row2:
dictData[cell1.value] = cell2.value
While running the above mentioned code, it stores for each value in the dictionary, only the last data from the row("20"-data from China).
{'Total': 20, 'USA': 20, 'Canada': 20, 'UK': 20, 'France': 20, 'Germany': 20, 'India': 20, 'Japan': 20, 'Singapore': 20, 'China': 20}
Does anyone have any idea/feedback on how to solve this? For sure I'm missing something in the 'for loop'.
Thank you! :)
Upvotes: 0
Views: 1947
Reputation: 28699
just had a look at ur shared data; since you are interested in just two rows (rows 2 and 5), you can read them individually,zip the data, and dict the zipped data. the relevant data starts from column 3, so I've included that as well:
from openpyxl import load_workbook
filename = 'Project_yxz.xlsx'
wb = load_workbook(filename)
ws = wb.active
#read in the data
row2 = ws.iter_rows(min_row=2, max_row=2, min_col=3, values_only=True)
row5 = ws.iter_rows(min_row=5, max_row=5, min_col=3, values_only=True)
#zip and dict
res = dict(zip(*row2,*row5))
res
{'Total': 720,
'USA': 72,
'Canada': 34,
'UK': 54,
'France': 46,
'Germany': 38,
'India': 120,
'Japan': 101,
'Singapore': 47,
'China': 20}
Lemme know if this is what u have in mind. further reading from openpyxl docs here
Upvotes: 1
Reputation: 6891
This does not call for a nested loop. For each outer value (dict key) you need only one inner value (dict value). Thus, the inner loop should be off size 1, which is no loop at all.
However, you must of course align your inne values with the outer ones. In most classical programming languages this would be achieved looping over an index used to address both items, i.e.
# Naive way
dictData = {}
for i in range(2, len(ws1[3])):
dictData[ws1[3][i].value] = ws1[5][i].value
This, however, is not very pythonic, as Python allows looping directly over the values. But looping over the outer values only, still requires an index to the inner values. This can be found using enumerate
in the loop, to get both the value and its index:
# Enumerate inner value (not very nice)
dictData = {}
for i, key in enumerate(ws1[3][2:], 2):
dictData[key.value] = ws1[5][i].value
Not much is gained by this, though, as we still need an index. We really want to be able to get both the dict key and the dict value from the loop directly. And, lo and behold, Python has a solution ready for us, namely using zip
which allows us to align two series in one for
loop:
# Using zip, nicer and more pythonic
dictData = {}
for key, value in zip(ws1[3][2:], ws1[5][2:]):
dictData[key.value] = value.value
Now we are getting somewhere. This is more like it, but we can go even further, using so-called dict comprehension, which allows us to create and assign the dict in one go:
# dict comprehension
dictData = {key.value: value.value for key, value in zip(ws1[3][2:], ws1[5][2:])}
Either of the last two is OK, I would say, even though I prefer the dict comprehension as I find it quite easy to read, but your milage may vary.
Also, as a final note, to get the code to feel even more pythonic, replace the camelCased dictData
with the more PEP8 compliant form dict_data
. But that is of course only correct if that style is used for all variables in your project. It is more important to be consistent.
Upvotes: 1
Reputation: 7131
Try not to nest the iterations. Read row 2 on its own.
Then iterate from 5 to something.
If you need the header for something then store it in a different directory e.g. with column name as key. Then access it later.
You create the following dictionary from the first row:
headers = {1:'Total', 2:'USA'}
Later when you iterate through the rest of the rows and columns you access it to get the name. You know the column index. Maybe you need to fix the offset or something.
some_other_dictionary = {headers[column_index]: 'some_value'}
Another possibility is to use something like pandas.read_excel which should do the hard work for you.
Upvotes: 1