Reputation: 10051
If I have lots of excel files as follows (here are just two examples):
data1.xlsx
data2.xlsx
Is it possible I just take the part with columns of id, a, b, c
and ignore the rest and concatenate all those files together into a new excel file in Python. Thanks.
Here is what I have tried:
import os
for root, dirs, files in os.walk(src, topdown=False):
for file in files:
if file.endswith('.xlsx') or file.endswith('.xls'):
#print(os.path.join(root, file))
try:
df0 = pd.read_excel(os.path.join(root, file))
#print(df0)
except:
continue
df1 = pd.DataFrame(columns = [columns_selected])
df1 = df1.append(df0, ignore_index = True)
print(df1)
df1.to_excel('test.xlsx', index = False)
Upvotes: 0
Views: 67
Reputation: 16792
Extending @Charles R's answer with your requirement of multiple excel files.
# get all the files
os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)
and then:
for File in FileList:
for x in File:
# the rest of the code for reading
Upvotes: 1
Reputation: 1661
use skpirows
and nrows
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
import pandas
df1 = pd.read_excel('data1.xlsx', skpirows=3, nrows=5)
df2 = pd.read_excel('data2.xlsx', skpirows=4, nrows=5)
dfFinal = df1.append(df2)
Upvotes: 1