Locate dataframe and concatenate based on specific headers in Python

Question

If I have lots of excel files as follows (here are just two examples):

data1.xlsx

data2.xlsx

Is it possible I just take the part with columns of id, a, b, c and ignore the rest and concatenate all those files together into a new excel file in Python. Thanks.

Here is what I have tried:

import os

for root, dirs, files in os.walk(src, topdown=False):
    for file in files:
        if file.endswith('.xlsx') or file.endswith('.xls'):
            #print(os.path.join(root, file))
            try:
                df0 = pd.read_excel(os.path.join(root, file))
                #print(df0)
            except:
                continue
            df1 = pd.DataFrame(columns = [columns_selected])
            df1 = df1.append(df0, ignore_index = True)
            print(df1)
            df1.to_excel('test.xlsx', index = False)

Charles R · Accepted Answer

use skpirows and nrows https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

import pandas

df1 = pd.read_excel('data1.xlsx', skpirows=3, nrows=5)
df2 = pd.read_excel('data2.xlsx', skpirows=4, nrows=5)

dfFinal = df1.append(df2)

Locate dataframe and concatenate based on specific headers in Python

Answers (2)

Related Questions