Reputation: 13
How can we read and index dynamically generating files, from a source folder, in python and append index with the newly added or unread files, in the folder, upon code refresh?
An automation tool is continuously putting files (say xlsx) to the source folder, a python program will then read and plot a graph from all the files present in the folder, to optimize the performance of the code, we are planning to not to read all the files once the code/ application is refreshed but to only append the index with the unread files.
An index could be a local variable/ table, which contains information about the input files, say which all files are already loaded/ read so that the system knows which one to read now and which all are already read. The idea is to read a file only once, not all the files after every refresh.
Upvotes: 1
Views: 237
Reputation: 46
Following code will help you to give the list of new file names with their index.
These variables are used:
Run this code for first time when you have bag_of_file is empty.
import os
curr_dir = "D:/2018/Address Matching/Data/Statewise Output/"
bag_of_files = [] #Comment out this line after using 1st time
curr_files = os.listdir(curr_dir)
new_files = []
for file in curr_files:
if file not in bag_of_files:
new_files.append(file)
bag_of_files.append(file)
new_files
Output:
['AP Output.csv',
'Delhi Output.csv',
'Gujrat Output.csv',
'Haryana Output.csv',
'Jharkhand Output V1.csv',
'Jharkhand Output V1.xlsx',
'Jharkhand Output.csv',
'Karnataka Output.csv']
Next time always run following code. Difference is only in line no. 3 where we used previous version of bag_of_files. Every time I added some new files in same folder.
curr_dir = "D:/2018/Address Matching/Data/Statewise Output/"
#bag_of_files = [] #Comment out this line after using 1st time
curr_files = os.listdir(curr_dir)
new_files = []
for file in curr_files:
if file not in bag_of_files:
new_files.append(file)
bag_of_files.append(file)
new_files
Output:
['Maharashtra Output.csv',
'MP Output.csv',
'Punjab Output.csv',
'Rajsthan Output.csv']
Run it again :)
Output:
['Bihar Output.csv',
'Tamilnadu Output.csv',
'Telangana Output.csv',
'WB Output.csv']
Upvotes: 1
Reputation: 14916
To keep the answer simple, you could use os.listdir() to monitor the directory content. The to watch for modified files that the program has already indexed, check the modified time on these with os.stat().
Upvotes: 0