Reputation: 91
I have several files with the same format, but with different values. with the help of StackOverflow users I got the code running, but now I am trying to optimize it, and I need some help to do it.
this is the full code:
import pandas as pd
# filenames
excel_names = ["file-JAN_2019.xlsx", "example-JAN_2019.xlsx", "stuff-JAN_2019.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in
excels]
#frames = [df.iloc[20:, :] for df in frames]
frames_2 = [df.iloc[21:, :] for df in frames[1:]]
#And combine them separately
combined = pd.concat([frames[0], *frames_2])
# concatenate them..
#combined = pd.concat(frames)
combined = combined[~combined[4].isin(['-'])]
combined.dropna(subset=[4], inplace=True)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
the code that I am trying to use is as follows:
from glob import glob
excel_names = glob.glob('*JAN_2019-jan.xlsx')
files = []
for names in (excel_names):
files.extend(names)
print(files)
at this moment i am getting the following error: Traceback (most recent call last): File "finaltwek.py", line 4, in excel_names = glob.glob('*JAN_2019-jan.xlsx') AttributeError: 'function' object has no attribute 'glob'
but while I was tweaking with the code I also made the code run, but it found all the files in the folder, and I need only the ones that have the same designation in the end, including the extension
I am trying to make the code more dynamic by making it find all the files that end in the same way and are located in the same folder, but for some reason, I can't make it work, can anyone help? Thanks
Upvotes: 2
Views: 12981
Reputation: 21
If you want to use glob.glob() then you should call
import glob
#then use
file_names = glob.glob('*.xlxs')
In your code, you are importing the glob function from the glob file. In that case you cannot use glob.glob(). For your code:
from glob import glob
excel_names = glob('*JAN_2019-jan.xlsx')
Upvotes: 2
Reputation: 18106
glob.glob("*JAN_2019-jan.xlsx")
will search within the directory where the Python interpreter is located.
You can easily construct a file path by using os.path.join(...)
and os.path.dirname(__file__)
to point to your script's directory:
import os
import glob
excel_names = glob.glob(os.path.join(os.path.dirname(__file__), '*JAN_2019-jan.xlsx'))
print execel_names
Prints for me:
['/tmp/ex-JAN_2019-jan.xlsx']
Upvotes: 2