Min Wang
Min Wang

Reputation: 305

Python Pandas copy columns from one sheet to another sheet without changing any data?

I have an excel file with two sheets. I would like to copy 3 columns from the first sheet to the second sheet.

Note:

But this is not the desired output.

If pandas can not do this, could you please suggest some other python package which could work ?

In case I'm not describing the problem clearly enough, I upload a pic which may help more or less. Thx for your answers~

jpg

UPDATE[2017.07.24]:

I Finally find my fault!

insert one column with index number and then , follow the resolution of b2002, things gonna to be well. :)

Upvotes: 3

Views: 8773

Answers (2)

b2002
b2002

Reputation: 914

This method uses pandas and xlsxwriter.

Setup (create demo excel file):

import pandas as pd

df1 = pd.DataFrame({'1_A': [1,2,3,4], '1_B': [5,4,6,5],
                    '1_C': [8,7,9,0], '1_D': [9,7,8,5], '1_E': [2,4,9,8]})
df2 = pd.DataFrame({'1_A': [5,4,1,3], '1_B': [55,2,3,4]})

setup_dict = {'Sheet_1': df1, 'Sheet_2': df2}

with pd.ExcelWriter('excel_file.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in setup_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

(Start here to read an existing excel file)

#Read your excel file, use "sheetname=None" to create a dictionary of
#worksheet dataframes.  (Note: future versions of pandas will use
#"sheet_name" vs. "sheetname").
#Replace 'excel_file.xlsx' with the actual path to your file.
ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)
#Modify the Sheet_2 worksheet dataframe:
#(or, create a new worksheet by assigning concatenated df to a new key,
#such as ws_dict['Sheet_3'] = ...)
ws_dict['Sheet_2'] = pd.concat([ws_dict['Sheet_2'][['1_A','1_B']], 
                                ws_dict['Sheet_1'][['1_A','1_B','1_C']]],
                                axis=1)
#Write the ws_dict back to disk as an excel file:
#(replace 'excel_file.xlsx' with your desired file path.)
with pd.ExcelWriter('excel_file.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

Other methods may be used to combine the columns such as a join (with different suffixes representing the original worksheets, for example) since all of the worksheets are converted to dataframes when the excel file is read.

EDIT (for new worksheet and unique column names...)

ws_dict = pd.read_excel('excel_file.xlsx', sheetname=None)
#Modify the Sheet_2 worksheet dataframe:
#(or, create a new worksheet by assigning concatenated df to a new key,
#such as ws_dict['Sheet_3'] = ...)
ws_dict['Sheet_3'] = ws_dict['Sheet_2'][['1_A','1_B']].join(ws_dict['Sheet_1'][['1_A','1_B','1_C']],
                                                            lsuffix='_sh2', rsuffix='_sh1', how='outer')
#Write the ws_dict back to disk as an excel file:
#(replace 'excel_file.xlsx' with your desired file path.)
with pd.ExcelWriter('excel_file.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

Upvotes: 2

Parfait
Parfait

Reputation: 107707

If using Excel and Python for Windows (and if not, for future readers), consider an SQL solution with an ODBC connection to the JET/ACE Engine which can query Excel workbooks, its own Access databases, even text files (csv/tab/txt). This engine which are .dll files is installed by default with Windows machines or MS Office. This approach avoids opening any workbook.

Simply run an INNER JOIN on the sheets and use panda's read_sql() to import query resultset directly into a dataframe. Connection can use pyodbc or pypyodbc modules. And since you work in SQL, SELECT needed columns, rename them, filter with WHERE, JOIN or UNION other worksheets and in other workbooks, even aggregate with GROUP BY:

import pyodbc
import pandas as pd

strfile = "C:\Path\To\Workbook.xlsx"

conn = pyodbc.connect(r'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};' + \
                               'DBQ={};'.format(strfile), autocommit=True)    

strSQL = " SELECT s1.[1_A] As s1_1_A, s1.[1_B] As s1_1_B," + \
         "        s2.[1_A] AS s2_1_A, s2.[1_B] As s2_1_B, s2.[1_C] As s2_1_C" + \
         " FROM [Sheet1$] s1" + \
         " INNER JOIN [Sheet2$] s2 ON s1.[index] = s2.[index]" 

df = pd.read_sql(strSQL, conn)

conn.close()

Upvotes: 1

Related Questions