DarthVader
DarthVader

Reputation: 55022

Combine data of one sheet from multiple excels

I have multiple excel files in the following format:

ID | Name | Prop1 | Prop2 | User

Data from excel1:

ID | Name | Prop1 | Prop2 | Prop3 | User 
1  | test |       |       |       | John

Data from Excel2:

ID | Name | Prop1 | Prop2 | Prop3 | User
1  | test |   a   |   b   |       | John

Data from Excel3:

ID | Name | Prop1 | Prop2 | Prop3 | User
1  | test |       |       |   c   | John

What I want to do is to combine these cells.

Desired output:

ID | Name | Prop1 | Prop2 | Prop3 | User
1  | test |   a   |   b   |   c   | John

If the cell is empty on a file and the other file has value in it, I would like to replace it.

is there any easy way to accomplish this?

Thanks.

Upvotes: 1

Views: 101

Answers (3)

jezrael
jezrael

Reputation: 862406

You can create list of all DataFrames by glob, for final df need combine_first with reduce:

import glob
from functools import reduce

files = glob.glob('files/*.xlsx')
dfs = [pd.read_excel(fp).set_index(['ID','Name','User']) for fp in files]

df1 = reduce(lambda l,r: pd.DataFrame.combine_first(l,r), dfs)
print (df1)
             Prop1 Prop2 Prop3
ID Name User                  
1  test John     a     b     c

EDIT: If dont need combine files with NaNs solution is simplier:

import glob

files = glob.glob('files/*.xlsx')
df = pd.concat([pd.read_excel(fp) for fp in files],ignore_index=True)

Upvotes: 4

ASH
ASH

Reputation: 20302

This should do what you want.

import pandas as pd
import numpy as np

        import glob
        glob.glob("C:/your_path_here/*.xlsx")

        all_data = pd.DataFrame()
        for f in glob.glob("C:/your_path_here/*.xlsx"):
            df = pd.read_excel(f)
            all_data = all_data.append(df,ignore_index=True)
            print(all_data)

Upvotes: 0

Rehan Azher
Rehan Azher

Reputation: 1340

try below:

df1 = pd.read_excel('Excel1.xlsx', sheetname='Sheet1');
df2= pd.read_excel('Excel2.xlsx', sheetname='Sheet1');
df3 = pd.read_excel('Excel3.xlsx', sheetname='Sheet1')
mylist = [df1,df2,d3]
df = pd.merge(df1, df2, on=['ID','USER'])
df = pd.merge(df, df3, on=['ID','USER'])
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()

I haven't got the chance to test this , but it should work .

Upvotes: 1

Related Questions