Reputation: 19
I have a large number of 2-dimensional files from which I am calculating an XX parameter as listed below.
'2019-10-12_17-43.csv',
'2019-10-12_17-42.csv',
'2019-10-12_17-41.csv',
'2019-10-12_17-44.csv',
'2019-10-12_17-40.csv',
'2019-10-11_17-40.csv',
......................
and so on...
I am able to create a list of filenames and calculate the XX parameter for that particular file. After subsequent calculations, I create a data-frame named YY which contains the parameter along with the column containing filenames from which it was calculated. On the basis of the certain value of the calculated XX parameter, I would like to plot all the 2-dimensional data which gives rise to it. I also create a list of filenames from the column of the data frame. Obviously the code is longer up to XX parameter calculation, but for reading the data from selected filenames in the list I use the following code in last block:
# arbitrary functions
def Aval (a, b):
..............
def Bval (a, b):
..............
file_path = r"C:\Users\Desktop\Data"
read_files = glob.glob(os.path.join(file_path,"*.csv"))
# generating the list of filenames
file_list = []
XYZ_array = []
ABC_array = []
for (root, dirs, files) in os.walk(file_path):
for filenames in files:
file_list.append(filenames)
df= pd.read_csv(os.path.join(root, filenames), header=0)
#Calculation from the files
ABC = ..................
XYZ = ..................
ABC_array.append(ABC)
XYZ_array.append(XYZ)
#creating a dataframe from the arrays
newdf = pd.DataFrame ({'ABC': ABC_array, 'XYZ':XYZ_array, 'Filename':file_list })
The dataframe generated looks like this:
Timestamp ABC XYZ Filename
2019-10-11_07-52 1.934985 0.187962 2019-10-11_07-52.csv
2019-10-11_07-53 1.926435 0.200828 2019-10-11_07-53.csv
2019-10-11_07-54 1.922927 0.215204 2019-10-11_07-54.csv
2019-10-11_07-55 1.951818 0.216678 2019-10-11_07-55.csv
2019-10-11_07-56 1.922523 0.245144 2019-10-11_07-56.csv
... ... ... ...
2019-10-13_18-21 2.028409 1.149067 2019-10-13_18-21.csv
2019-10-13_18-22 2.027896 1.015862 2019-10-13_18-22.csv
2019-10-13_18-23 2.013004 0.871320 2019-10-13_18-23.csv
2019-10-13_18-24 1.991576 0.755164 2019-10-13_18-24.csv
2019-10-13_18-25 1.908259 0.570786 2019-10-13_18-25.csv
The ABC values are binned in three bins bins = [1.76,1.86,1.96]
Abc_sorted = newdf.sort_values('ABC')
Abc_sorted['Bin_names'] = pd.cut(Abc_sorted['ABC'], bins, labels=['1.76','1.86','1.96'])
T_df = Abc_sorted.sort_values(by=['Bin names']).dropna()
results in a dataframe like:
Timestamp ABC XYZ Filename Bin_names
2019-10-12_17-43 1.769676 72.841836 2019-10-12_17-43.csv 1.76
2019-10-12_17-42 1.771429 74.583635 2019-10-12_17-42.csv 1.76
2019-10-12_17-41 1.774526 76.104981 2019-10-12_17-41.csv 1.76
2019-10-12_17-44 1.774678 68.314091 2019-10-12_17-44.csv 1.76
2019-10-12_17-40 1.779273 76.589191 2019-10-12_17-40.csv 1.76
... ... ... ... ... ... ... ... ... ...
2019-10-12_09-48 1.988249 85.279987 2019-10-12_09-48.csv 1.96
2019-10-13_09-04 1.988266 28.716690 2019-10-13_09-04.csv 1.96
2019-10-12_11-27 1.988597 76.978562 2019-10-12_11-27.csv 1.96
2019-10-11_16-19 1.985438 76.343396 2019-10-11_16-19.csv 1.96
2019-10-11_08-11 1.999933 0.251199 2019-10-11_08-11.csv 1.96
A new dataframe is created based on the bin_name 1.76 and filenames as and a list containing filenames is created as:
ndf = T_df.loc[Total_df.Bin_names =='1.76'][['Filename', 'Bin_names']]
filename_list=ndf['Filename'].tolist()
Which results in dataframe as:
Filename Bin_names
2019-10-12_17-43.csv 1.76
2019-10-12_17-42.csv 1.76
2019-10-12_17-41.csv 1.76
2019-10-12_17-44.csv 1.76
2019-10-12_17-40.csv 1.76
Now the main task is to import the files in the filename_list from main directory:
for i in range(len(filename_list)):
print (filename_list[i])
for file in read_files:
if fnmatch.fnmatch(file, filename_list[i]):
print(file)
where read_files
is the path, the file
is the filename in the path and filename_list
is the list containing the multiple files. I have binned the data to 3 different values and I want to import only the files that give ABC parameter value 1.76. But this doesn't seem to work and nothing is returned. Could anyone help?
Upvotes: 0
Views: 931
Reputation: 23743
If ndf
looks like this:
>>> ndf
Filename Bin_names
0 2019-10-12_17-43.csv 1.76
1 2019-10-12_17-42.csv 1.76
2 2019-10-12_17-41.csv 1.76
3 2019-10-12_17-44.csv 1.76
4 2019-10-12_17-40.csv 1.76
and filename_list
looks like this:
>>> filename_list = ndf['Filename'].to_list()
>>> filename_list
['2019-10-12_17-43.csv', '2019-10-12_17-42.csv', '2019-10-12_17-41.csv', '2019-10-12_17-44.csv', '2019-10-12_17-40.csv']
and the files are located in
file_path = r"C:\Users\Desktop\Data"
Then the complete paths to all your files should be
>>> [os.path.join(file_path, name) for name in filename_list]
['C:\\Users\\Desktop\\Data\\2019-10-12_17-43.csv', 'C:\\Users\\Desktop\\Data\\2019-10-12_17-42.csv', 'C:\\Users\\Desktop\\Data\\2019-10-12_17-41.csv', 'C:\\Users\\Desktop\\Data\\2019-10-12_17-44.csv', 'C:\\Users\\Desktop\\Data\\2019-10-12_17-40.csv']
>>>
You could also add the file path to the Filename
column
>>> ndf.Filename.apply(lambda x: os.path.join(file_path,x))
0 C:\Users\Desktop\Data\2019-10-12_17-43.csv
1 C:\Users\Desktop\Data\2019-10-12_17-42.csv
2 C:\Users\Desktop\Data\2019-10-12_17-41.csv
3 C:\Users\Desktop\Data\2019-10-12_17-44.csv
4 C:\Users\Desktop\Data\2019-10-12_17-40.csv
Name: Filename, dtype: object
>>>
Or using pathlib
>>> import pathlib
>>> p = pathlib.PurePath(file_path)
>>> ndf.Filename.apply(p.joinpath)
0 C:\Users\Desktop\Data\2019-10-12_17-43.csv
1 C:\Users\Desktop\Data\2019-10-12_17-42.csv
2 C:\Users\Desktop\Data\2019-10-12_17-41.csv
3 C:\Users\Desktop\Data\2019-10-12_17-44.csv
4 C:\Users\Desktop\Data\2019-10-12_17-40.csv
Name: Filename, dtype: object
>>>
You used os.walk
to find all the files then you appended the filename to a list but had to use os.path.join(root, filenames)
to open the file with pandas. Maybe the files are in different directories and you should save the whole path when you make file_list
- then you will be able to access the files using their absolute paths without searching for them.
Upvotes: 1