sgerbhctim
sgerbhctim

Reputation: 3640

How to open excel file, write data to column from dataframe column, save as new file

I have an Excel file that contains a column with header 'Original Translation'. I also have a DataFrame with column 'Original Translation - {language}', based on the language I am using and some manipulations.

My goal is to open the Excel file, write over the column with header 'Original Translation' with all the data from my DataFrame column 'Original Translation - {language}', preserve the original Excel file formatting, and save to a new output folder.

Here is the code I currently have:

def output_formatted_capstan_file(df, original_file, country, language):
    # this is where you generate the file:
    # https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas
    try:
        print(original_file)
        book = load_workbook(original_file)
        writer = pd.ExcelWriter(original_file, engine='openpyxl')
        writer.book = book
        df.to_excel(writer, ['Original Translation - {}'.format(language)])
        writer.save()
    except:
        print('Failed')

Upvotes: 0

Views: 447

Answers (1)

HadynB
HadynB

Reputation: 81

I would approach this using the following method.

1) Import the excel file using a function such as pandas.read_excel, thus taking the data from excel into a dataframe. I'll call this exceldf

2) Merge this dataframe with the data you already have in the Pandas DataFrame. I will call your existing translated dataframe translateddf

3) Reorder the newly merged dataframe newdf and then export out the data. Further options for how to reorder are shown here: re-ordering data frame

4) Export the data to Excel. I'll leave you to integrate it into your initial code. For the generic answer to the question, others may want to look into the integrated Pandas options here to_excel

Example Code

import pandas

# Read in the Excel file
exceldf = pandas.read_excel(open('your_xls_xlsx_filename'), sheetname='Sheet 1')

# Create a new dataframe with your merged data, merging on 'key1'.
# We then drop the column of the original translation, as it should no longer be needed
# I've included the rename argument in case you need it.
newdf = exceldf.merge(translateddf, left_on=['key1'], \
                right_on=['key1']) \
.rename(columns={'Original Translation {language}': 'Original Translation {language}'}) \
.drop(['Original Translation'], axis=1)

# Re-order your data. 
# Note that if you renamed anything above, you have to update it here too
newdf = newdf[['0', '1', '2', 'Original Translation {language}']]

# An example export, that uses the generic implementation, not your specific code
pandas.newdf.to_excel("output.xlsx")

Upvotes: 1

Related Questions