Aaraeus
Aaraeus

Reputation: 1155

Copying an entire column using OpenPyXL in Python 3

I'm trying to copy an entire column over using OpenPyXL. Google seems to offer a lot of examples using ranges, but not for an entire column.

I have a workbook with a single worksheet with a load of dates in column A and column JX (A contains monthly dates, JX contains quarterly dates). I want the monthly dates column (in A:A) to be copied over to each worksheet ending in 'M' in my target workbook, and the quarterly dates column (in JX:JX) to the worksheets ending in 'Q'.

However, for some reason the last nested for loop, for src, dst in zip(ws_base[monthRange], ws_target['A:A']): is only copying the first cell, and nothing else. It looks like I'm identifying the correct column with my monthRange and quarterRange strings, but Python isn't looping through the whole column despite the fact that I've got two ranges defined.

Does anyone have any ideas?

# Load the target workbook
targetwb = openpyxl.load_workbook('pythonOutput.xlsx')


# Load the source workbook
wb_base = openpyxl.load_workbook('Baseline_IFRS9_' + reportingMonth+'.xlsx')

# Go to row 9 and find "Geography:" to identify the relevant 
# month and quarter date columns

sentinel = u"Geography:"
ws_base = wb_base.active

found = 0
dateColumns = []

for column in ws_base:
    for cell in column:
        if cell.value == sentinel:
            dateColumns.append(cell.column) #
            found + 1

            if found == 2:
                break


ColumnM = dateColumns[0]
ColumnQ = dateColumns[1]

print('Monthly col is ' + ColumnM)
print('Quarterly col is ' + ColumnQ)

IndexM = int(openpyxl.utils.column_index_from_string(str(ColumnM)))
IndexQ = int(openpyxl.utils.column_index_from_string(str(ColumnQ)))

print('Monthly col index is ' + str(IndexM))
print('Quarterly col index is ' + str(IndexQ))

print('Proceeding to paste into our new workbook...')

sheetLoop = targetwb.get_sheet_names()


for sheets in sheetLoop:
    if sheets.endswith('Q'):
        ws_target = targetwb[sheets]
        quarterRange = ColumnQ + ':' + ColumnQ

        print('Copying and pasting quarterly dates into: ' + sheets)
        for src, dst in zip(ws_base[quarterRange], ws_target['A:A']):
            dst.value = src.value

    elif sheets.endswith('M'):
        ws_target = targetwb[sheets]
        monthRange = ColumnM + ':' + ColumnM

        print('Copying and pasting monthly dates into: ' + sheets)
        for src, dst in zip(ws_base[monthRange], ws_target['A:A']):
            dst.value = src.value

targetwb.save('pythonOutput.xlsx')

Here's a simpler form of my problem.

import openpyxl

wb1 = openpyxl.load_workbook('pythonInput.xlsx')
ws1 = wb1.active

wb2 = openpyxl.load_workbook('pythonOutput.xlsx')
ws2 = wb2.active

for src, dst in zip(ws1['A:A'], ws2['B:B']):
    print( 'Printing from ' + str(src.column) + str(src.row) + ' to ' + str(dst.column) + str(dst.row))
    dst.value = src.value

wb2.save('test.xlsx') 

So the problem here is that the for loop only prints from A1 to B1. Shouldn't it be looping down across rows..?

Upvotes: 2

Views: 10294

Answers (1)

Norrius
Norrius

Reputation: 7920

When you load a new XLSX in a spreadsheet editor, you see lots and lots of empty cells in a grid. However, these empty cells are actually omitted from the file, and they will be only written once they have a non-empty value. You can see for yourself: XLSX is essentially a bunch of ZIP-compressed XMLs, which can be opened with any archive manager.

In a similar fashion, new cells in OpenPyXL are only created when you access them. The ws2['B:B'] range only contains one cell, B1, and zip stops when the shortest iterator is exhausted.

With this in mind, you can iterate through the source range and use explicit coordinates to save the values in correct cells:

import openpyxl

wb1 = openpyxl.load_workbook('pythonInput.xlsx')
ws1 = wb1.active

wb2 = openpyxl.load_workbook('pythonOutput.xlsx')
ws2 = wb2.active

for cell in ws1['A:A']:
    print('Printing from ' + str(cell.column) + str(cell.row))
    ws2.cell(row=cell.row, column=2, value=cell.value)

wb2.save('test.xlsx') 

Upvotes: 2

Related Questions