Python/Linux: How to determine when a moved file is fully available?

Question

I have a folder into which new files are constantly being added. I have a python script that uses os.listdir() to find these files and then perform analysis on them automatically. However, the files are quite large and so they seem to show up in os.listdir() before they've actually been completely written/copied. Is there some way to distinguish which files are not in the process of being moved? Comparing sizes with os.path.getsize() doesn't seem to work.

Raspbian Buster on Pi4 with Python 3.7.3. I am a noob to programming and linux.

Thanks!

Jatin Mehrotra · Accepted Answer

For a conceptual explanation of Atomic and cross filesystem moves, refer this moves in Python ( can really save your time)

You can take the following approaches to deal with your problem:-

->Monitor Filesystem Events with Pyinotify usage of Pynotify

-> Lock the file for few seconds using flock

-> Using lsof we can basically check for the processes that are using a particular file.

`from subprocess import check_output,Popen, PIPE
try:
   lsout=Popen(['lsof',filename],stdout=PIPE, shell=False)
   check_output(["grep",filename], stdin=lsout.stdout, shell=False)
except:
   #check_output will throw an exception here if it won't find any process using that file`

just write your log processing code in the except part and you are good to go.

-> a daemon that monitors the parent folder for any changes, by using, E.G., the watchdog library watchdog implementation

-> You can either check the file which is being used by another process by looping through the PID/s in /proc for a specific id (assuming you have the control over the program which is adding the new files continuously to identify its id).

-> Can check if a file has a handle on it using psutil.

Python/Linux: How to determine when a moved file is fully available?

Answers (2)

Related Questions