Reputation: 33
My main goal is to check an FTP server at anytime for a new file hits and then generate a .txt file with only the new files copied there. If there are no new files then it returns nothing. Here is what I have so far. I have started by copying the files from the server into oldlist.txt, then connecting to the FTP site and comparing data from newlist.txt and oldlist.txt and the differences I want in Temporary FTP file changes.txt. Each time I connect I will change newlist.txt and make it the oldlist.txt so that I can compare the next time I connect. Is there a better way to do this? My lists seem to never change data each time. Sorry if this is confusing thanks.
import os
filename = "oldlist.txt"
testing = "newlist.txt"
tempfilename = "Temporary FTP file Changes.txt"
old = open(filename, "r")
oldlist = old.readlines()
oldlist.sort()
from ftplib import FTP
ftp = FTP("ftpsite", "username", "password")
ftp.set_pasv(False)
newlist = []
ftp.dir(newlist.append)
newlist.sort()
ftp.close()
bob = open(testing, "w")
for nl in newlist:
bob.write(nl + "\n")
hello = open(tempfilename, "w")
for c in newlist:
if c not in oldlist:
hello.write(c + "\n")
bob.close()
old.close()
hello.close()
os.remove("oldlist.txt")
os.rename("newlist.txt", "oldlist.txt")
Upvotes: 3
Views: 970
Reputation: 19347
Your implementation of this scheme is reasonable. I would not choose this scheme to implement automated FTP messaging, if that is what you're doing. There are two weaknesses of this approach:
One scheme that is similar but does not have either of these two problems is to actually store a file on the server with a reserved name, or in a separate place, and use its timestamp (preferably the modification time of the file itself) to decide which files can be safely processed. This "semaphore" file is updated to the current time as the last step in uploading a file. All files with a modification time older than the semaphore timestamp can be processed. Once processed, all files must be deleted out of the upload folder so they won't be processed twice. I have seen this scheme work well in an automated production data flow.
Upvotes: 0
Reputation: 226346
It's a little easier/faster to convert the lists to a set and not worry about sorting.
for filename in set(newlist) - set(oldlist):
print 'New file: ', filename
Also, instead of saving the list to a file as raw text, you could use the shelve module to make a persistent store that is conveniently accessible like a regular Python dict.
Otherwise, your code has the virtues of being simple and straight-forward.
Here's a worked out example:
from ftplib import FTP
import shelve
olddir = shelve.open('filelist.shl') # create a persistent dictionary
ftp = FTP('ftp1.freebsd.org')
ftp.login()
result = []
ftp.dir(result.append)
newdir = set(result[1:])
print ' New Files '.center(50, '=')
for line in sorted(set(newdir) - set(olddir)):
print line
olddir[line] = ''
print ' Done '.center(50, '=')
olddir.close()
Upvotes: 3