Reputation: 23
I am trying to read multiple excel files into a data frame and but I can't seem to find a way to keep the file name as a column to reference to where it came from. Also, I need to filter the name of the excel file and the date created before I do read_excel. (there are so many files that I do not want to read them if I don't need to) This is what I have:
res = []
for root, dirs, files in os.walk('.../Minutes/', topdown=True):
if len(files) > 0:
res.extend(zip([root]*len(files), files))
df = pd.DataFrame(res, columns=['Path', 'File_Name'])
df['FullDir'] = df.Path+'\\'+df.File_Name
list_ = []
for f in df["FullDir"]:
data = pd.read_excel(f, sheet_name = 1)
list_.append(data)
df2 = pd.concat(list_)
df2
What I would like as an output
A B filename File Date Created
0 a a File1 1-1-2018
1 b b File1 1-1-2018
2 c c FIle2 2-1-2018
3 a a File2 2-1-2018
Any help would be greatly appreciated!!
Upvotes: 2
Views: 1556
Reputation: 323276
You can using concat
with keys
, then reset_index
res = []
for root, dirs, files in os.walk('.../Minutes/', topdown=True):
if len(files) > 0:
res.extend(zip([root]*len(files), files))
df = pd.DataFrame(res, columns=['Path', 'File_Name'])
df['FullDir'] = df.Path+'\\'+df.File_Name
Assuming above code is work as expected
list_ = []
for f in df["FullDir"]:
data = pd.read_excel(f, sheet_name = 1)
list_.append(data)
df2 = pd.concat(list_, keys=df.File_Name.values.tolist()).reset_index(level=0)
Upvotes: 2