Reputation: 618
I have a lot of excel files that I need to compile into a single excel file, and then copy the compiled one into an existing excel file (with macro / .xlsm) in a certain sheet.
I solved the first problem (compiling multiple excel files into a single excel file). The resulted dataframe is saved in .csv format. The resulted file looks like this.
Until here there is no issue. The next step that I am struggling to find out how to do it.
From the resulted dataframe I want to "copy-and-paste" the dataframe to the existing excel file with macro (.xlsm) in the sheet "Source" at the corresponding headers. The existing excel file looks like this.
As you may see from the picture above, I want to skip writing any data in the column A since the cells within this column is full of formula. I want to write the resulted dataframe in the column B to column Q within the existing excel file. However, before writing the data, I want delete all the existing data in all cells (except in the cells within column A).
So basically I want to do the following:
Any feedback will be much appreciated! Thanks!
Regards,
Arnold
Upvotes: 0
Views: 11798
Reputation: 401
As an alternative you can use xlwings
import pandas as pd
import xlwings
df = pd.DataFrame({
"A": [420, 380, 390],
"B": [50, 40, 45],
"C": [60, 50, 20]
})
with xlwings.App(visible=False) as app:
wb = xlwings.Book('Test.xlsm')
ws = wb.sheets('MySheet')
ws.range('B1:D50').delete(shift='up') #if you want to delete the values, if not they will be overwritten
ws.range('B1:D1').options(header=False, index=False).value = df[['A','B','C' ]]
wb.save()
wb.close()
Upvotes: 0
Reputation: 21
I found the follwing solution based on openpyxl. What I learned is that xlsxwriter is not able to open existing excel files. Therefore, my approach is based on openpyxl.
import pandas as pd
import openpyxl # one excel reader/writer compatible with pandas
book = openpyxl.load_workbook('input.xlsm', keep_vba = True) # Load existing .xlsm file
with pd.ExcelWriter('output.xlsm', engine='openpyxl') as writer: # open a writer instance with the filename of the
writer.book = book # Hand over input workbook
writer.sheets = dict((ws.title, ws) for ws in book.worksheets) # Hand over worksheets
writer.vba_archive = book.vba_archive # Hand over VBA information
df_write.to_excel(writer, sheet_name = 'Sheet1', columns = ['A'],
header = False, index = False,
startrow = 1, startcol = 0)
# Writes a column A of the Dataframe into the excel sheet 'Sheet1', which can
# already be present in input.xlsm, to row 1, col 0
writer.save()
Upvotes: 2
Reputation: 618
Sorry a bit late to come back updating my question. Finally I have solved my problem with openpyxl package.
So here is my final code:
import openpyxl
import os
import string
import pandas as pd
import numpy as np
path = #folder directory
target_file = #excel filename
sheetname = #working sheet that you wish to work on with
filename = os.path.join(path, target_file)
wb = openpyxl.load_workbook(filename, keep_vba=True)
sheet = wb.get_sheet_by_name(sheetname)
# To Erase All Values within Selected Columns
d = dict()
for x, y in zip(range(1, 27), string.ascii_lowercase):
d[x] = y.upper()
max_row = sheet.max_row
max_col = sheet.max_column
for row in range(max_row):
row += 1
if row == 1: continue
for col in range(max_col):
col += 1
if col == 1: continue
sheet['{}{}'.format(d[col], row)] = None
# To Write Values to the Blank Worksheet
path_dataframe = # folder directory to the csv file
target_compiled = # csv filename
filename_compiled = os.path.join(path_compiled, target_compiled)
compiled = pd.read_csv(filename_compiled, low_memory=False, encoding = "ISO-8859-1")
for row in range(len(compiled.index)):
row += 1
if row == 1: continue # I do not want to change the value in row 1 in excel file because they are headers
for col in range(max_col):
col += 1
if col == 1: continue # I do not want to change the values in column 1 in excel file since they contain formula
value = compiled.iloc[row-2][col-2]
if type(value) is str: value = value
elif type(value) is np.float64: value = float(value)
elif type(value) is np.int64: value = int(value)
sheet['{}{}'.format(d[col], row)] = value
wb.save(filename)
Upvotes: 0
Reputation: 107767
Since your csv import into spreadsheet can be handled with an Excel VBA macro using QueryTables consider having Python replicate VBA with a COM interface to the Excel Object Library. All previous macro code remains intact as nothing is overwritten but cell data. Note: Below assumes you are using Excel for Windows.
Using the win32com
library, Python can replicate almost anything VBA does. In fact, you will come to know VBA is an add-on reference in Office applications and never a native, built-in object and does the same COM interfacing! See first selected item in Tools\References
in your IDE.
import pandas as pd
import win32com.client as win32
# ...same pandas code...
macrofile = "C:\\Path\\To\\Macro\\Workbook.xlsm"
strfile = "C:\\Path\\To\\CSV\\Output.csv"
df.to_csv(strfile)
try:
xl = win32.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open(macrofile)
# DELETE PREVIOUS DATA
wb.Sheets("Source").Range("B:Q").EntireColumn.Delete()
# ADD QUERYTABLE (SPECIFYING DESTINATION CELL START)
qt = wb.Sheets("Source").QueryTables.Add(Connection="TEXT;" + strfile,
Destination=wb.Sheets(1).Cells(2, 2))
qt.TextFileParseType = 1
qt.TextFileConsecutiveDelimiter = False
qt.TextFileTabDelimiter = False
qt.TextFileSemicolonDelimiter = False
qt.TextFileCommaDelimiter = True
qt.TextFileSpaceDelimiter = False
qt.Refresh(BackgroundQuery=False)
# REMOVE QUERYTABLE
for qt in wb.Sheets("Source").QueryTables:
qt.Delete()
# CLOSES WORKBOOK AND SAVES CHANGES
wb.Close(True)
except Exception as e:
print(e)
finally:
qt = None
wb = None
xl = None
Alternatively, create a new macro in VBA (placed in a standalone module) and have Python call it, passing the csv file path as parameter:
VBA
Public Sub ImportCSV(strfile As String)
Dim qt As QueryTable
ThisWorkbook.Sheets("Source").Range("B:Q").EntireColumn.Delete
' ADD QUERYTABLE
With ThisWorkbook.Sheets("Source").QueryTables.Add(Connection:="TEXT;" & strfile, _
Destination:=ThisWorkbook.Sheets(1).Cells(2, 2))
.TextFileParseType = xlDelimited
.TextFileConsecutiveDelimiter = False
.TextFileTabDelimiter = False
.TextFileSemicolonDelimiter = False
.TextFileCommaDelimiter = True
.TextFileSpaceDelimiter = False
.Refresh BackgroundQuery:=False
End With
' REMOVE QUERYTABLE
For Each qt In ThisWorkbook.Sheets(1).QueryTables
qt.Delete
Next qt
Set qt = Nothing
End Sub
Python
import pandas as pd
import win32com.client as win32
# ...same pandas code...
macrofile = "C:\\Path\\To\\Macro\\Workbook.xlsm"
strfile = "C:\\Path\\To\\CSV\\Output.csv"
df.to_csv(strfile)
try:
xl = win32.gencache.EnsureDispatch('Excel.Application')
wb = xl.Workbooks.Open(macrofile)
xl.Application.Run('ImportCSV', strfile)
wb.Close(True)
xl.Quit
except Exception as e:
print(e)
finally:
wb = None
xl = None
Upvotes: 0