JD Roberson
JD Roberson

Reputation: 599

Moving files only if matching file exists

I have an application that requires two files to process data. A zip file containing the actual data then a control file that says what to do with said data.

These files are downloaded via sftp to a staging directory. Once the zip file is complete, I need to check and see if the control file is there as well. They share a naming prefix only(Eg. 100001_ABCDEF_123456.zip is paired with 100001_ABCDEF_control_file.ctl.

I am trying to find a way to wait for the zip file to finishing downloading then move the files on the fly, while maintaining the directory structure as that is important for the next step in processing.

Currently I am waiting till the sftp worker finishes then calling robocopy to move everything. I would like a more polished approach.

I have tried several things and I get the same results. Files download but never move. For some reason I just cannot get the compare to work correctly.

I have tried using a FileSystemWatcher to look for the rename from filepart to zip but it seems to miss several downloads and for some reason the function dies when I get to my foreach to search the directory for the control file. Below is the FileSystemWatcher event, I am calling this for created and changed. Also below is the setup for the filesystemwatcher.

        watcher.Path = @"C:\Sync\";
        watcher.IncludeSubdirectories = true;
        watcher.EnableRaisingEvents = true;
        watcher.Filter = "*.zip";
        watcher.NotifyFilter = NotifyFilters.Attributes |
                               NotifyFilters.CreationTime |
                               NotifyFilters.FileName |
                               NotifyFilters.LastAccess |
                               NotifyFilters.LastWrite |
                               NotifyFilters.Size |
                               NotifyFilters.Security | 
                               NotifyFilters.CreationTime | 
                               NotifyFilters.DirectoryName;
        watcher.Created += Watcher_Changed;
        watcher.Changed += Watcher_Changed;

 private void Watcher_Changed(object sender, FileSystemEventArgs e)
    {
        var dir = new DirectoryInfo(e.FullPath.Substring(0, e.FullPath.Length - e.Name.Length));
        var files = dir.GetFiles();

        FileInfo zipFile = new FileInfo(e.FullPath);

        foreach (FileInfo file in files)
        {
            MessageBox.Show(file.Extension);
            if (file.Extension == "ctl" && file.Name.StartsWith(e.Name.Substring(0, (e.Name.Length - 14))))
            {
                file.CopyTo(@"C:\inp\");
                zipFile.CopyTo(@"C:\inp\");
            }
        }
    }

Upvotes: 3

Views: 218

Answers (2)

Alex
Alex

Reputation: 13224

The FileSystemWatcher class is notoriously tricky to use correctly, because you will get multiple events for a single file that is being written to, moved or copied, as @WillStoltenberg also mentioned in his answer.

I have found that it is much easier just to setup a task that runs periodically (e.g. every 30 seconds). For your problem, you could easily do something like the below. Note that a similar implementation using a Timer, instead of the Task.Delay, may be preferable.

public class MyPeriodicWatcher 
{
    private readonly string _watchPath;
    private readonly string _searchMask;
    private readonly Func<string, string> _commonPrefixFetcher;
    private readonly Action<FileInfo, FileInfo> _pairProcessor;
    private readonly TimeSpan _checkInterval;
    private readonly CancellationToken _cancelToken;

    public MyPeriodicWatcher(
        string watchPath,
        string searchMask,
        Func<string, string> commonPrefixFetcher,
        Action<FileInfo, FileInfo> pairProcessor,
        TimeSpan checkInterval,
        CancellationToken cancelToken)
    {
        _watchPath = watchPath;
        _searchMask = string.IsNullOrWhiteSpace(searchMask) ? "*.zip" : searchMask;
        _pairProcessor = pairProcessor;
        _commonPrefixFetcher = commonPrefixFetcher;
        _cancelToken = cancelToken;
        _checkInterval = checkInterval;
    }

    public Task Watch()
    {
        while (!_cancelToken.IsCancellationRequested)
        {
            try
            {
                foreach (var file in Directory.EnumerateFiles(_watchPath, _searchMask))
                {
                    var pairPrefix = _commonPrefixFetcher(file);
                    if (!string.IsNullOrWhiteSpace(pairPrefix))
                    {
                        var match = Directory.EnumerateFiles(_watchPath, pairPrefix + "*.ctl").FirstOrDefault();
                        if (!string.IsNullOrEmpty(match) && !_cancelToken.IsCancellationRequested)
                            _pairProcessor(
                                new FileInfo(Path.Combine(_watchPath, file)),
                                new FileInfo(Path.Combine(_watchPath, match)));
                    }
                    if (_cancelToken.IsCancellationRequested)
                        break;
                }
                if (_cancelToken.IsCancellationRequested)
                    break;

                Task.Delay(_checkInterval, _cancelToken).Wait().ConfigureAwait(false);
            }
            catch (OperationCanceledException)
            {
                break;
            }
        }
    }
}

You will need to provide it with

  • the path to monitor
  • the search mask for the first file (i.e. *.zip)
  • a function delegate that gets the common file name prefix from the zip file name
  • an interval
  • the delegate that will perform the moving and receives the FileInfo for the pair to be processed / moved.
  • and a cancellation token to cleanly cancel monitoring.

In your pairProcessor delegate, catch IO exceptions, and check for a sharing violation (which likely means writing the file has not yet completed).

Upvotes: 0

Will Stoltenberg
Will Stoltenberg

Reputation: 11

Watcher_Changed is going to get called for all sorts of things, and not every time it's called will you want to react to it.

The first thing you should do in the event handler is try to exclusively open zipFile. If you cannot do it, ignore this event and wait for another event. If this is an FTP server, every time a new chunk of data is written to disk, you'll get a changed event. You could also put something on a "retry" queue or use some other mechanism to check to see if the file available at a later time. I have a similar need in our system, and we try every 5 seconds after we notice a first change. Only once we can exclusively open the file for writing, do we allow it to move on to the next step.

I would tighten up your assumptions about what the filename looks like. You're limiting the search to *.zip, but don't depend on only your .zip files existing in that target directory. Validate that the parsing you're doing of the filename isn't hitting unexpected values. You may also want to check that dir.Exists() before calling dir.GetFiles(). That could be throwing exceptions.

As to missing events, see this good answer on buffer overflows: FileSystemWatcher InternalBufferOverflow

Upvotes: 1

Related Questions