Reputation: 2156
Using pathlib.Path().glob(), how do we iterate through a directory and read in 2 files at each iteration?
Suppose my directory C:\Users\server\Desktop\Dataset
looks like this:
P1_mean_fle.csv
P2_mean_fle.csv
P3_mean_fle.csv
P1_std_dev_fle.csv
P2_std_dev_fle.csv
P3_std_dev_fle.csv
If I want to read in only 1 file at each iteration of the Pi's, my code would look like this:
from pathlib import Path
import pandas as pd
file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'
for i, fle in enumerate(Path(file_path).glob(param_file)):
mean_fle = pd.read_csv(fle).values
results = tuning(mean_fle) #tuning is some function which takes in the file mean
#and does something with this file
Now, how I do read in 2 files at each iteration of the Pi's? The code below doesn't quite work because param_file
can only be assigned with one file name type. Would appreciate if there is a way to do this using pathlib
.
from pathlib import Path
import pandas as pd
param_file = 'P*' + '_mean_fle.csv'
param_file = 'P*' + '_std_dev_fle.csv' #this is wrong
for i, fle in enumerate(Path(file_path).glob(param_file)): #this is wrong inside the glob() part
mean_fle = pd.read_csv(fle).values
std_dev_fle = pd.read_csv(fle).values
results = tuning(mean_fle, std_dev_fle) #tuning is some function which takes in the two files mean
#and std_dev and does something with these 2 files
Thank you in advance.
Upvotes: 1
Views: 182
Reputation: 17606
I suggest you two approaches:
1.
If you are sure that you have all your files without 'holes' in numbering, you can just take them without 'glob':
mean_csv_pattern = 'P{}_mean_fle.csv'
std_dev_pattern = 'P{}_std_dev_fle.csv'
i = 0
while True:
i += 1
try:
mean_fle = pd.read_csv(mean_csv_pattern.format(i)).values
std_dev_fle = pd.read_csv(std_dev_pattern.format(i)).values
except (<put your exceptions here>):
break
results = tuning(mean_fle, std_dev_fle)
2.
Use a pre-fetch operation that takes all your files and put them in a structure that you can query in your main loop.
Glob for mean files, glob for std_dev files, take the number from the filename and biuld a dictionary {index: {'mean_file': mean_file, 'std_file': std_file)} and then loop over sorted dictionary keys...
Upvotes: 1
Reputation: 3116
If your filenames follow deterministic rules as in the example, your best bet is to iterate one kind of files, and find the corresponding file by string replacement.
from pathlib import Path
import pandas as pd
file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'
for i, fle in enumerate(Path(file_path).glob(param_file)):
stddev_fle = fle.with_name(fle.name.replace("mean", "std_dev"))
mean_values = pd.read_csv(fle).values
stddev_values = pd.read_csv(stddev_fle).values
results = tuning(mean_values, stddev_values)
Upvotes: 3