Botham Ruosh
Botham Ruosh

Reputation: 13

How to read and index dynamically generating files in python

How can we read and index dynamically generating files, from a source folder, in python and append index with the newly added or unread files, in the folder, upon code refresh?

An automation tool is continuously putting files (say xlsx) to the source folder, a python program will then read and plot a graph from all the files present in the folder, to optimize the performance of the code, we are planning to not to read all the files once the code/ application is refreshed but to only append the index with the unread files.

An index could be a local variable/ table, which contains information about the input files, say which all files are already loaded/ read so that the system knows which one to read now and which all are already read. The idea is to read a file only once, not all the files after every refresh.

Upvotes: 1

Views: 237

Answers (2)

Akshay Gujar
Akshay Gujar

Reputation: 46

Following code will help you to give the list of new file names with their index.

These variables are used:

  • bag_of_file : Content list of file names which already proceed
  • curr_files : Contents list of file names which are in source folder
  • new_files : Contents list of file names which you are interested in.

Run this code for first time when you have bag_of_file is empty.

import os
curr_dir = "D:/2018/Address Matching/Data/Statewise Output/"
bag_of_files = [] #Comment out this line after using 1st time
curr_files = os.listdir(curr_dir)
new_files = []
for file in curr_files:
    if file not in bag_of_files:
        new_files.append(file)
        bag_of_files.append(file)

new_files

Output:

['AP Output.csv',
'Delhi Output.csv',
'Gujrat Output.csv',
'Haryana Output.csv',
'Jharkhand Output V1.csv',
'Jharkhand Output V1.xlsx',
'Jharkhand Output.csv',
'Karnataka Output.csv']

Next time always run following code. Difference is only in line no. 3 where we used previous version of bag_of_files. Every time I added some new files in same folder.

curr_dir = "D:/2018/Address Matching/Data/Statewise Output/"
#bag_of_files = [] #Comment out this line after using 1st time
curr_files = os.listdir(curr_dir)
new_files = []
for file in curr_files:
    if file not in bag_of_files:
        new_files.append(file)
        bag_of_files.append(file)
new_files

Output:

['Maharashtra Output.csv',
 'MP Output.csv',
 'Punjab Output.csv',
 'Rajsthan Output.csv']

Run it again :)

Output:

['Bihar Output.csv',
 'Tamilnadu Output.csv',
 'Telangana Output.csv',
 'WB Output.csv']

Upvotes: 1

Kingsley
Kingsley

Reputation: 14916

To keep the answer simple, you could use os.listdir() to monitor the directory content. The to watch for modified files that the program has already indexed, check the modified time on these with os.stat().

Upvotes: 0

Related Questions