Reputation: 673
I have a folder full of excel files and i have to read only 3 files from that folder and put them into individual dataframes.
File1: Asterix_New file_Jan2020.xlsx
File2: Asterix_Master file_Jan2020.xlsx
File3: Asterix_Mapping file_Jan2020.xlsx
I am aware of the below syntax which finds xlsx file from a folder but not sure how to relate it to specific keywords. In this case starting with "Asterix_"
files_xlsx = [f for f in files if f[-4:] == "xlsx"]
Also i am trying to put each of the excel file in a individual dataframe but not getting successful:
for i in files_xlsx:
df[i] = pd.read_excel(files_xlsx[0])
Any suggestions are appreciated.
Upvotes: 0
Views: 3180
Reputation: 28644
I suggest using pathlib. If all the files are in a folder:
from pathlib import Path
from fnmatch import fnmatch
folder = Path('name of folder')
Search for the files using glob. I will also suggest using fnmatch to include the files whose extensions are in capital letters.
iterdir allows you to iterate through the files in the folder
name is a method in pathlib that gives you the name of the file in string format
using the str lower method ensures that extensions such as XLSX, which is uppercase is captured
excel_only_files = [xlsx for xlsx in folder.iterdir()
if fnmatch(xlsx.name.lower(),'asterix_*.xlsx')]
OR
#you'll have to test this, i did not put it though any tests
excel_only_files = list(folder.rglob('Asterix_*.[xlsx|XLSX]')
from there, you can run a list comprehension to read your files:
dataframes = [pd.read_excel(f) for f in excel_only_files]
Upvotes: 3
Reputation: 12329
If you have read in the file names, you can make sure that it starts with and ends with the desired strings by using this list comprehension:
files = ['filea.txt', 'fileb.xlsx', 'filec.xlsx', 'notme.txt']
files_xlsx = [f for f in files if f.startswith('file') and f.endswith('xlsx')]
files_xlsx # ['fileb.xlsx', 'filec.xlsx']
The list comprehension says, "Give me all the files that start with file
AND end with xlsx
.
Upvotes: 0
Reputation: 2983
First generate a list of files you want to read in using glob
(based on @cup's answer) and then append them to a list.
import pandas as pd
import glob
my_df_list = [pd.read_excel(f) for f in glob.iglob('Asterix_*.xlsx')]
Depending on what you want to achieve, you can also use a dict
to allow for key-value pairs.
Upvotes: 1
Reputation: 51
At the end of the if statement you need to add another condition for files which also contain 'Asterix_':
files_xlsx = [f for f in files if f[-4:] == "xlsx" and "Asterix_" in f]
The f[-4:] == "xlsx"
is to make sure the last 4 characters of the file name are xlsx and "Asterix_" in f
makes sure that "Asterix_" exists anywhere in the file name.
To then read these using pandas try:
for file in excel_files:
df = pd.read_excel(file)
print(df)
That should print the result of the DataFrame read from the excel file
Upvotes: 0
Reputation: 8237
Use glob.glob to do your pattern matches
import glob
for i in glob.glob('Asterix_*.xlsx'):
...
Upvotes: 2