Reputation: 59
I need to take multiple worksheets in an excel workbook and merge them into a single dataframe based on a set of column values in each sheet.
I have:
Sheet 1:
ID A B C
1 0 l g
2 2 e n
3 3 c h
Sheet 2:
ID L M N
1 7 u i
2 0 o j
3 9 c k
I'm looking for: Sheet 3:
ID A B C L M N
1
2
3
EDIT I'm dealing with an arbitrary number of sheets, which is what makes it complicated.
I'm new to pandas/python/coding, but am working with this right now:
import pandas as pd
import numpy as np
def get_sheets():
""" Get sheets to join"""
ask = input("Are the sheets in the same workbook?".lower())
if ask == "yes" or "y":
file = input("Please enter the filepath for the workbook")
df_lib = pd.read_excel(file, None)
merged = pd.merge(df_lib.items(), how="left" on='ID')
merged.to_csv("new_merged_data.csv")
I'm returning an error because I don't have a "right" dataframe to join on. But I'm not sure how to either break apart the library of dataframes created by the pd.read_excel function or to call them within pd.merge function.
Upvotes: 0
Views: 2048
Reputation: 384
If you have an arbitrary number of sheets that you want to merge you can load all sheets with the following command:
# for pandas version < 0.21.0
sheets = pd.read_excel(file_name, sheetname=None)
# for pandas version >= 0.21.0
sheets = pd.read_excel(file_name, sheet_name=None)
This will give you an ordered dict with sheet name as key and corresponding data frame as value.
Then you will need list of data frames from sheets. You can obtain that using
dfs = list(sheets.values())
Once you have this you can use the code below to merge all sheets into one data frame.
from functools import reduce
merged = reduce(lambda left, right: pd.merge(left, right, on='ID', how='left'), dfs)
results.to_csv("new_merged_data.csv")
Please try it :)
Upvotes: 1
Reputation: 585
You can get both worksheets in 2 different dataframes and merge them.
import pandas as pd
import numpy as np
def get_sheets():
""" Get sheets to join"""
ask = input("Are the sheets in the same workbook?".lower())
if ask == "yes" or "y":
file = input("Please enter the filepath for the workbook")
df1 = pd.read_excel(file, sheet_name='Sheet1')
df2 = pd.read_excel(file, sheet_name='Sheet2')
results= df1.merge(df2, on='ID', how="left")
results.to_csv("new_merged_data.csv")
Along with this, I noticed that in your code, you are missing ,
between how="left" on='ID'
Upvotes: 0