Reputation: 190
I currently have a folder that contains multiple files with similar names that I am trying to read from. For example: Folder contains files:
apple_2019_08_26_23434.xls
apple_2019_08_25_55345.xls
apple_2019_08_24_99345.xls
the name format of the file is very simple:
apple_<date>_<5 random numbers>.xls
How can I read the excel file into a pandas df if I do not care about the random 5 digits at the end?
e.g.
df = pd.read_excel('e:\Document\apple_2019_08_26_<***wildcard***>.xls')
Thank you!
Upvotes: 2
Views: 821
Reputation: 316
You could use unix style pathname expansions via glob.
import glob
# get .txt files in current directory
txt_files = glob.glob('./*.txt')
# get .xls files in some_dir
xls_files = glob.glob('some_dir/*.xls')
# do stuff with files
# ...
Here, *
basically means "anything".
Example with pandas:
import glob
for xls_file in glob.glob('e:/Document/apple_2019_08_26_*.xls'):
df = pd.read_excel(xls_file)
# do stuff with df
# ...
Upvotes: 3
Reputation: 461
If you want the 5-digit part to be changeable in the code, you could try something like this:
from os import listdir
from os.path import isfile, join
import pandas as pd
mypath = '/Users/username/aPath'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
fiveDigitNumber = onlyfiles[0][17:22]
filename = onlyfiles[0][:17]+fiveDigitNumber+onlyfiles[0][22:]
df = pd.read_excel(filename)
Upvotes: 1
Reputation: 42916
Change your directory with os.chdir
then import all files which startwith
the correct name:
import os
os.chdir(r'e:\Document')
dfs = [pd.read_excel(file) for file in os.listdir() if file.startswith('apple_2019_08')]
Now you can access each dataframe by index:
print(dfs[0])
print(dfs[1])
Or combine them to one large dataframe if they have the same format
df_all = pd.concat(dfs, ignore_index=True)
Upvotes: 2