Natesh
Natesh

Reputation: 3

Extract excel data from multiple folders to a DataFrame and extract the folder names for each rows to the Dataframe

I am trying to extract the data from xlsx file from the multiple folders and also to get the folder name for each row to identify where the is extracted from. I am able to extract the data from all the folders however i am unable to get the folder names to the dataframe. Please help.

Folder structure -

Month-Year - 2020-02 Day folder - 2020-02-01

Under the day folder consist of xlsx files.

paths = []
arr = []

for root, dirs, files in os.walk(Full_Path):
    for file in files:
        if file.endswith(".xlsx"):
             #print(os.path.join(root, file))
             ab = os.path.join(root, file)
             print(ab)
             arr.append(paths)   
             paths.append(ab)


for lm in arr:
    print(lm)   


all_data = pd.DataFrame()
for f in paths:
    df = pd.read_excel(f, sheet_name='In Queue', usecols = fields)
    df['Date'] = lm 
    all_data = all_data.append(df,ignore_index=True)

I have also tried different ways but not getting the output.

Upvotes: 0

Views: 128

Answers (1)

Parfait
Parfait

Reputation: 107567

Consider building a list of data frames to be concatenated once outside the loop. Also, use assign to generate the Date column during the loop:

df_list = []

for root, dirs, files in os.walk(Full_Path):
    for file in files:
        if file.endswith((".xls", ".xlsx", ".xlsb", ".xlsm", ".odf")):
             xl_file = os.path.join(root, file)
             df = (pd.read_excel(xl_file, sheet_name='In Queue', usecols = fields)
                     .assign(Date = xl_file))

             df_list.append(df)


final_df = pd.concat(df_list, ignore_index=True)

Upvotes: 1

Related Questions