Reputation: 845
Let's asume I have a structure like this:
Folder1
`XX_20201212.txt`
Folder1
`XX_20201212.txt`
Folder1
`XX_20201212.txt`
My current script collects the 3 files in each folder, processes them and makes 1 file of it. So right now my scripts does the job for 1 date.
Now lets asume the structure has changed to this:
Folder1
`XX_20201201.txt`
`XX_20201202.txt`
Folder1
`YY_20201201.txt`
`YY_20201202.txt`
Folder1
`ZZ_20201201.txt`
`ZZ_20201202.txt`
`ZZ_20201203.txt`
I want my script to do the same now but for multiple dates. I want my script to check if a file has a date in its name which is also present in a list named missing_dates
and if that file is available in each directory. If so I want to collect it and process it into 1 file. So if we assume 20201201, 20201202 and 20201203
are in missing_list
. The following needs to happen.
XX_20201201.txt, YY_20201201.txt
and ZZ_20201201.txt
into 1 file because that date is present in missing_dates
AND its present in every directory.XX_20201202.txt, YY_20201202.txt
and ZZ_20201202.txt
into 1 file because that date is present in missing_dates
AND its present in every directory..ZZ_20201203.txt
because that date is not present in every directory even though its present in the missing_dates.
So actually shortly said: 3 files with same date (in 3 different directories) with a date that is present in missing_dates
= proceed
Please note that below code which is proceding the files into 1 file is already working, the underlying problem is that I have to adjust my loop so that it will always process more than 1 date. I dont know how to do that....
This is the code that reads the files:
for root, dirs, files in os.walk(counter_part):
for file in files:
date_files= re.search('_(.\d+).', file).group(1)
with open(file_path, 'r') as my_file:
reader = csv.reader(my_file, delimiter = ',')
next(reader)
for row in reader:
if filter_row(row):
vehicle_loc_dict[(row[9], location_token(row))].append(row)
Upvotes: 1
Views: 334
Reputation: 103754
With the tools in pathlib this is fairly easy.
Given:
% tree /tmp/test
/tmp/test
├── dir_1
│ ├── XX_20201201.txt
│ └── XX_20201202.txt
├── dir_2
│ ├── YY_20201201.txt
│ └── YY_20201202.txt
└── dir_3
├── ZZ_20201201.txt
├── ZZ_20201202.txt
└── ZZ_20201203.txt
3 directories, 7 files
You can do:
from pathlib import Path
root=Path('/tmp/test')
missing_dates=['20201201']
for fn in (e for e in root.glob('**/*.txt')
if e.is_file() and any(d in str(e) for d in missing_dates)):
print(fn)
# here do what you mean by 'proceed' with path fn
Prints:
/tmp/test/dir_2/YY_20201201.txt
/tmp/test/dir_3/ZZ_20201201.txt
/tmp/test/dir_1/XX_20201201.txt
Or, you could do:
missing_dates=['20201201', '20201202']
for d in missing_dates:
print(f"processing {d}")
for fn in (e for e in root.glob(f"**/*_{d}.txt") if e.is_file()):
print(fn)
# here do what you mean by 'proceed'
Prints:
processing 20201201
/tmp/test/dir_2/YY_20201201.txt
/tmp/test/dir_3/ZZ_20201201.txt
/tmp/test/dir_1/XX_20201201.txt
processing 20201202
/tmp/test/dir_2/YY_20201202.txt
/tmp/test/dir_3/ZZ_20201202.txt
/tmp/test/dir_1/XX_20201202.txt
If you are only interested in groups of 3, you can do:
missing_dates=['20201201', '20201202', '20201203']
for d in missing_dates:
print(f"processing {d}")
files=[fn for fn in (e for e in root.glob(f"**/*_{d}.txt") if e.is_file())]
if len(files)==3:
print(files)
Prints:
processing 20201201
[PosixPath('/tmp/test/dir_2/YY_20201201.txt'), PosixPath('/tmp/test/dir_3/ZZ_20201201.txt'), PosixPath('/tmp/test/dir_1/XX_20201201.txt')]
processing 20201202
[PosixPath('/tmp/test/dir_2/YY_20201202.txt'), PosixPath('/tmp/test/dir_3/ZZ_20201202.txt'), PosixPath('/tmp/test/dir_1/XX_20201202.txt')]
processing 20201203
You can do the same thing with os.walk
and glob.glob
but it is just more work...
Upvotes: 1