tkpk
tkpk

Reputation: 23

read_excel into data frame and keep file name as column (Pandas)

I am trying to read multiple excel files into a data frame and but I can't seem to find a way to keep the file name as a column to reference to where it came from. Also, I need to filter the name of the excel file and the date created before I do read_excel. (there are so many files that I do not want to read them if I don't need to) This is what I have:

res = []
for root, dirs, files in os.walk('.../Minutes/', topdown=True):
    if len(files) > 0:
        res.extend(zip([root]*len(files), files))

df = pd.DataFrame(res, columns=['Path', 'File_Name'])

df['FullDir'] = df.Path+'\\'+df.File_Name

list_ = []
for f in df["FullDir"]:
    data = pd.read_excel(f, sheet_name = 1)
    list_.append(data)
    df2 = pd.concat(list_)

df2

What I would like as an output

   A  B  filename  File Date Created
0  a  a  File1     1-1-2018
1  b  b  File1     1-1-2018
2  c  c  FIle2     2-1-2018
3  a  a  File2     2-1-2018

Any help would be greatly appreciated!!

Upvotes: 2

Views: 1556

Answers (1)

BENY
BENY

Reputation: 323276

You can using concat with keys , then reset_index

res = []
for root, dirs, files in os.walk('.../Minutes/', topdown=True):
    if len(files) > 0:
        res.extend(zip([root]*len(files), files))

df = pd.DataFrame(res, columns=['Path', 'File_Name'])

df['FullDir'] = df.Path+'\\'+df.File_Name

Assuming above code is work as expected

list_ = []
for f in df["FullDir"]:
    data = pd.read_excel(f, sheet_name = 1)
    list_.append(data)

df2 = pd.concat(list_, keys=df.File_Name.values.tolist()).reset_index(level=0)

Upvotes: 2

Related Questions