Reputation: 141
I am tasked with looking for files in a network location with many many sub directories. My current implementation is based on answers I found on stackoverflow:
private void PollFolder(string sourceDir)
{
try
{
// var start = DateTime.Now.AddHours(-_filesToGetTimeRangeFromNowInDays);
var start = DateTime.Now.AddMonths(-5);
var end = DateTime.Now;
var filesFromToolDir = Directory.GetFiles(sourceDir, "*.gz", SearchOption.AllDirectories)
.Where(f => new FileInfo(f).CreationTime >= start
&& new FileInfo(f).CreationTime <= end)
.ToArray();
}
catch (Exception ex)
{
}
}
I am supposed to filter files by creation date in a certain time range given by the user. Here I am using an example of 5 months. The issue with this function is that for some directories, it can take up to 5 hours to find files within a specified time range.
My question: is there any way for me to optimize and make this file search across a network folder with many sub directories faster? Is there a better way to look for files?
Upvotes: 1
Views: 334
Reputation:
The provided code will only work on Windows!
Do not, and I can't stress this enough, call it on C: directly when setting recurse to true. You will run out of memory.
I tested it on my D: drive which contains 1743 folders and 71921 files.
Getting all folders and subfolders as well as all files created in the past 5 months took 938 milliseconds.
How to use it:
FileSearchOptions options = new FileSearchOptions(
new string[] { "*.*" },
DateTime.Now.AddMonths(-5),
DateTime.Now
);
string[] dirs = DirectoryUtil.GetDirectories(root, true);
string[] files = DirectoryUtil.LoadFiles(options, dirs);
FileSearchOptions
using System;
namespace Your.Namespace
{
/// <summary>
/// Contains options for a file search.
/// </summary>
public struct FileSearchOptions
{
/// <summary>
/// Array of file type filters.
/// <para>Text file example: *.txt</para>
/// </summary>
public string[] FileTypes;
/// <summary>
/// The minimum creation timestamp of the file.
/// </summary>
public Nullable<DateTime> CreationTimeMin;
/// <summary>
/// The maximum creation timestamp of the file.
/// </summary>
public Nullable<DateTime> CreationTimeMax;
/// <summary>
/// The minimum last write timestamp of the file.
/// </summary>
public Nullable<DateTime> LastWriteTimeMin;
/// <summary>
/// The maximum last write timestamp of the file.
/// </summary>
public Nullable<DateTime> LastWriteTimeMax;
public FileSearchOptions(
string[] fileTypes,
DateTime? createdMin = null,
DateTime? createdMax = null,
DateTime? lastWriteMin = null,
DateTime? lastWriteMax = null)
{
FileTypes = fileTypes;
CreationTimeMin = createdMin;
CreationTimeMax = createdMax;
LastWriteTimeMin = lastWriteMin;
LastWriteTimeMax = lastWriteMax;
}
}
}
SystemTime
using System;
using System.Runtime.InteropServices;
namespace Your.Namespace
{
[StructLayout(LayoutKind.Sequential, Pack = 2)]
internal struct SystemTime
{
public ushort Year;
public ushort Month;
public ushort DayOfWeek;
public ushort Day;
public ushort Hour;
public ushort Minute;
public ushort Second;
public ushort Milliseconds;
public SystemTime(DateTime dt)
{
dt = dt.ToUniversalTime();
Year = Convert.ToUInt16(dt.Year);
Month = Convert.ToUInt16(dt.Month);
DayOfWeek = Convert.ToUInt16(dt.DayOfWeek);
Day = Convert.ToUInt16(dt.Day);
Hour = Convert.ToUInt16(dt.Hour);
Minute = Convert.ToUInt16(dt.Minute);
Second = Convert.ToUInt16(dt.Second);
Milliseconds = Convert.ToUInt16(dt.Millisecond);
}
public SystemTime(ushort year, ushort month, ushort day, ushort hour = 0, ushort minute = 0, ushort second = 0, ushort millisecond = 0)
{
Year = year;
Month = month;
Day = day;
Hour = hour;
Minute = minute;
Second = second;
Milliseconds = millisecond;
DayOfWeek = 0;
}
public static implicit operator DateTime(SystemTime st)
{
if (st.Year == 0 || st == MinValue)
return DateTime.MinValue;
if (st == MaxValue)
return DateTime.MaxValue;
//DateTime dt = new DateTime(st.Year, st.Month, st.Day, st.Hour, st.Minute, st.Second, st.Milliseconds, DateTimeKind.Utc);
return new DateTime(st.Year, st.Month, st.Day, st.Hour, st.Minute, st.Second, st.Milliseconds, DateTimeKind.Utc);
}
public static bool operator ==(SystemTime s1, SystemTime s2)
{
return (s1.Year == s2.Year
&& s1.Month == s2.Month
&& s1.Day == s2.Day
&& s1.Hour == s2.Hour
&& s1.Minute == s2.Minute
&& s1.Second == s2.Second
&& s1.Milliseconds == s2.Milliseconds);
}
public static bool operator !=(SystemTime s1, SystemTime s2)
{
return !(s1 == s2);
}
public static readonly SystemTime MinValue, MaxValue;
static SystemTime()
{
MinValue = new SystemTime(1601, 1, 1);
MaxValue = new SystemTime(30827, 12, 31, 23, 59, 59, 999);
}
public override bool Equals(object obj)
{
if (obj is SystemTime)
return ((SystemTime)obj) == this;
return base.Equals(obj);
}
public override int GetHashCode()
{
return base.GetHashCode();
}
}
}
DirectoryUtil
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.InteropServices;
using ComTypes = System.Runtime.InteropServices.ComTypes;
using System.IO;
using System.Runtime.ConstrainedExecution;
using System.Security;
namespace Your.Namespace
{
internal static class DirectoryUtil
{
//
// Searches a directory for a file or subdirectory
// with a name and attributes that match specified.
//
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Unicode)]
private static extern IntPtr FindFirstFileExW(
string lpFileName, // The directory or path, and the file name.
FINDEX_INFO_LEVELS fInfoLevelId, // The information level of the returned data.
out WIN32_FIND_DATA lpFindFileData, // A pointer to the buffer that receives the file data.
FINDEX_SEARCH_OPS fSearchOp, // The type of filtering to perform
// that is different from wildcard matching.
IntPtr lpSearchFilter, // A pointer to the search criteria if the specified fSearchOp
// needs structured search information.
int dwAdditionalFlags // Specifies additional flags that control the search.
);
//
// Continues a file search from a previous call to the
// FindFirstFileExW function.
//
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
private static extern bool FindNextFile(
IntPtr hFindFile, // The search handle returned by a previous call
// to the FindFirstFileExW function.
out WIN32_FIND_DATA lpFindFileData // A pointer to the WIN32_FIND_DATA structure
// that receives information about the found file or subdirectory.
);
//
// Converts a file time to system time format.
// System time is based on UTC.
//
[DllImport("kernel32.dll", SetLastError = true)]
private static extern bool FileTimeToSystemTime(
[In] ref ComTypes.FILETIME lpFileTIme, // A pointer to a FILETIME structure containing
// the file time to be converted to system UTC.
out SystemTime lpSystemTime // A pointer to a SYSTEMTIME structure to
// receive the converted file time.
);
//
// Contains information about the file that is found by
// the FindFirstFileExW function.
//
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
internal struct WIN32_FIND_DATA
{
[MarshalAs(UnmanagedType.U4)]
public FileAttributes dwFileAttributes; // The file attributes of a file.
public ComTypes.FILETIME ftCreationTime; // A FILETIME structure taht specifies
// when a file or directory was created.
public ComTypes.FILETIME ftLastAccessTime; // A FILETIME structure that specifies
// when the file was last read from, written to, or run (.exe).
public ComTypes.FILETIME ftLastWriteTime; // A FILETIME structure that specifies
// when the files was last written to, trucated, or overwritten.
public uint nFileSizeHigh; // The high-order DWORD value of the file size, in bytes.
public uint nFileSizeLow; // The low-order DWORD value of the file size, in bytes.
public uint dwReserved0; // If the dwFileAttributes member includes the FILE_ATTRIBUTE_REPARSE_POINT
// attribute, this member specifies the reparse point tag.
public uint dwReserved1; // Reserved for future use.
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName; // The name of the file.
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName; // An alternative name for the file.
public uint dwFileType; // Obsolete. Do not use.
public uint dwCreatorType; // Obsolete. Do not use.
public uint wFinderFlags; // Obsolete. Do not use.
}
//
// Defines values that are used with the FindFirstFileEx
// function to specify the information level of the returned data.
//
internal enum FINDEX_INFO_LEVELS
{
// The FindFirstFileEx function retrieves a
// standard set of attribute information. The data is returned in a
FindExInfoStandard = 0,
// The FindFirstFileEx function does not query the short file name,
// improving overall enumeration speed. The data is returned in a
// WIN32_FIND_DATA structure, and the cAlternateFileName
// member is always a NULL string.
// This value is not supported until Windows Server 2008 R2 and Windows 7.
FindExInfoBasic = 1
}
//
// Defines values that are used with the FindFirstFileEx
// function to specify the type of filtering to perform.
//
internal enum FINDEX_SEARCH_OPS
{
// The search for a file that matches a specified file name.
FindExSearchNameMatch = 0,
// This is an advisory flag.
// If the file system supports directory filtering, the function
// searches for a file that matches the specified name and is also a directory.
// If the file system does not support directory filtering,
// this flag is silently ignored.
FindExSearchLimitToDirectories = 1,
// This filtering type is not available.
FindExSearchLimitToDevices = 2
}
// Searches are case-sensitive.
private const int FIND_FIRST_EX_CASE_SENSITIVE = 1;
// Uses a larger buffer for directory queries,
// which can increase performance of the find operation.
// This value is not supported until Windows Server 2008 R2 and Windows 7.
private const int FIND_FRIST_EX_LARGE_FETCH = 2;
// Limits the results to files that are physically on disk.
// This flag is only relevant when a file virtualization filter is present.
private const int FIND_FIRST_EX_ON_DISK_ENTRIES_ONLY = 4;
// Invalid pointer value.
private static readonly IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
// Caught Win32 errors.
private static List<string> _errors = new List<string>();
public static string LastError
{
get {
return _errors[_errors.Count - 1];
}
}
public static string[] Errors
{
get {
return _errors.ToArray();
}
}
//
// Formats a file path to match the search format
// of the FindFirstFileExW function.
//
private static readonly Func<string, string, string> FormatFilePath = (s, f) =>
{
if (s.EndsWith(".") || s.EndsWith("..")) {
return string.Empty;
}
if (s.EndsWith("\\")) {
s += f;
}
if (!(s.EndsWith("\\" + f))) {
s += "\\" + f;
}
if (s == ".\\*" || s == "..\\*") {
return string.Empty;
}
return s;
};
//
// Formats a directory path to match the search format
// of the FindFirstFileExW function.
//
private static readonly Func<string, string> FormatPath = (s) =>
{
if (s.EndsWith(".") || s.EndsWith("..")) {
return string.Empty;
}
if (s.EndsWith("\\")) {
s += "*";
}
if (!(s.EndsWith("\\*"))) {
s += "\\*";
}
if (s == ".\\*" || s == "..\\*") {
return string.Empty;
}
return s;
};
//
// Gets all files in the specified directory
// and adds them to the referenced list object.
//
private static void LoadFilesInternal(
string dir,
string fileType,
FileSearchOptions options,
ref List<string> files)
{
// Get standard set of information.
FINDEX_INFO_LEVELS findLevel = FINDEX_INFO_LEVELS.FindExInfoStandard;
// File name search.
FINDEX_SEARCH_OPS findOps = FINDEX_SEARCH_OPS.FindExSearchNameMatch;
int additionalFlags = 0;
// Check if OS version is later supported
// OS beginning from WinSvr 2008 R2 and Win 7
if (Environment.OSVersion.Version.Major >= 6) {
// Ingore short file name to improve performance.
findLevel = FINDEX_INFO_LEVELS.FindExInfoBasic;
// Use larger buffer.
additionalFlags = FIND_FRIST_EX_LARGE_FETCH;
}
// Format path to match FindFirstFileExW pattern.
string search = FormatFilePath(dir, fileType);
if (string.IsNullOrEmpty(search)) {
return;
}
WIN32_FIND_DATA ffd;
// Try to get handle to first file system object.
IntPtr hFind = FindFirstFileExW(
search, findLevel,
out ffd, findOps,
IntPtr.Zero, additionalFlags
);
// FindFirstFileExW failed...
if (INVALID_HANDLE_VALUE == hFind) {
int err = Marshal.GetLastWin32Error();
_errors.Add("FindFirstFileExW returned Win32 error: " + err);
return;
}
// Stores the end of the path without search pattern.
// Used to create new file name.
string end = string.Empty;
// Stores the new concatinated file name.
string newDir = string.Empty;
// SystemTime used to convert WinAPI FILETIME to DateTime.
SystemTime st;
// Stores the file creation time.
DateTime ct;
// Stores the file last write time.
DateTime lw;
// Check if options has a creation time timespan.
bool hasCreationTime = options.CreationTimeMin.HasValue && options.CreationTimeMax.HasValue;
// Check if options has a last write time timespan.
bool hasWriteTime = options.LastWriteTimeMin.HasValue && options.LastWriteTimeMax.HasValue;
do
{
// Ignore if handle points to directory.
if ((ffd.dwFileAttributes & FileAttributes.Directory)
== FileAttributes.Directory)
{
continue;
}
// Ignore if handle points to current directory
// or top-level directory.
if (ffd.cFileName == ".." || ffd.cFileName == ".") {
continue;
}
// Convert FILETIME to SystemTime and
// SystemTime to .Net DateTime.
FileTimeToSystemTime(ref ffd.ftCreationTime, out st);
ct = (DateTime)st;
FileTimeToSystemTime(ref ffd.ftLastWriteTime, out st);
lw = (DateTime)st;
// No creation time or write is specified.
if (!(hasCreationTime) && !(hasWriteTime)) {
end = search.Replace("\\" + fileType, string.Empty);
newDir = end + Path.DirectorySeparatorChar + ffd.cFileName;
files.Add(newDir);
}
// Creation time is specified and write time is not.
else if ((hasCreationTime && !(hasWriteTime)) &&
(ct <= options.CreationTimeMax
&& ct >= options.CreationTimeMin))
{
end = search.Replace("\\" + fileType, string.Empty);
newDir = end + Path.DirectorySeparatorChar + ffd.cFileName;
files.Add(newDir);
}
// Creation time is not specified and write time is.
else if ((!(hasCreationTime) && hasWriteTime) &&
lw <= options.LastWriteTimeMax
&& lw >= options.LastWriteTimeMin)
{
end = search.Replace("\\" + fileType, string.Empty);
newDir = end + Path.DirectorySeparatorChar + ffd.cFileName;
files.Add(newDir);
}
// Creation time and write time is specified.
else if (hasCreationTime && hasWriteTime &&
(ct <= options.CreationTimeMax
&& ct >= options.CreationTimeMin) &&
lw <= options.LastWriteTimeMax
&& lw >= options.LastWriteTimeMin)
{
end = search.Replace("\\" + fileType, string.Empty);
newDir = end + Path.DirectorySeparatorChar + ffd.cFileName;
files.Add(newDir);
}
}
while (FindNextFile(hFind, out ffd));
}
/// <summary>
/// Loads all files in the referenced directories.
/// </summary>
/// <param name="options">Options for the file search.</param>
public static string[] LoadFiles(FileSearchOptions options, ICollection<string> directories)
{
if (options.FileTypes == null) {
options.FileTypes = new string[] { "*.*" };
}
if (options.FileTypes.Length == 0) {
options.FileTypes = new string[] { "*.*" };
}
// Iterate through directories and add all filtered files.
List<string> files = new List<string>();
foreach (string filter in options.FileTypes) {
foreach (string dir in directories) {
LoadFilesInternal(dir, filter, options, ref files);
}
}
return files.ToArray();
}
/// <summary>
/// Adds all directories within the root directory
/// to an existing array of directories.
/// </summary>
/// <param name="root">The root directory.</param>
/// <param name="directories">The array to add the new directories.</param>
/// <param name="recursive">TRUE to get all sub-directories recursivly.</param>
public static void AddDirectory(string root, ref string[] directories, bool recursive = false)
{
if (!(directories.Contains(root)) && !(recursive)) {
string[] dirNew = new string[directories.Length + 1];
Buffer.BlockCopy(directories, 0, dirNew, 0, directories.Length);
dirNew[dirNew.Length - 1] = root;
directories = dirNew;
dirNew = new string[0];
dirNew = null;
}
else if (!(directories.Contains(root)) && recursive) {
List<string> dirTemp = new List<string>();
dirTemp.AddRange(directories);
GetDirectoriesRecursInternal(root, ref dirTemp);
directories = dirTemp.ToArray();
}
}
//
// Gets all the directories and sub-directories
// in the specified root directory.
//
private static void GetDirectoriesRecursInternal(string dir, ref List<string> directories)
{
// Get standard set of information.
FINDEX_INFO_LEVELS findLevel = FINDEX_INFO_LEVELS.FindExInfoStandard;
// File name search.
FINDEX_SEARCH_OPS findOps = FINDEX_SEARCH_OPS.FindExSearchNameMatch;
int additionalFlags = 0;
// Check if OS version is later supported
// OS beginning from WinSvr 2008 R2 and Win 7.
if (Environment.OSVersion.Version.Major >= 6) {
// Ignore short file name to improve performance.
findLevel = FINDEX_INFO_LEVELS.FindExInfoBasic;
// Use larger buffer.
additionalFlags = FIND_FRIST_EX_LARGE_FETCH;
}
// Format path to match FindFirstFileExW pattern.
dir = FormatPath(dir);
if (string.IsNullOrEmpty(dir)) {
return;
}
WIN32_FIND_DATA ffd;
// Try to get handle to first file system object.
IntPtr hFind = FindFirstFileExW(
dir, findLevel,
out ffd, findOps,
IntPtr.Zero, additionalFlags
);
// FindFirstFileExW failed...
if (INVALID_HANDLE_VALUE == hFind) {
int err = Marshal.GetLastWin32Error();
_errors.Add("FindFirstFileExW returned Win32 error: " + err);
return;
}
// Stores end of directory name without search pattern.
// Used to create new directory name.
string end = string.Empty;
// Stores the new concatinated directory name.
string newDir = string.Empty;
do
{
// Check if handle points to directory.
if ((ffd.dwFileAttributes & FileAttributes.Directory)
== FileAttributes.Directory)
{
// Ignore if handle points to current directory
// or top-level directory.
if (ffd.cFileName != ".." && ffd.cFileName != ".") {
// Remove wildcard from current directory.
end = dir.Replace("\\*", string.Empty);
// Create new directory name.
newDir = end + Path.DirectorySeparatorChar + ffd.cFileName;
directories.Add(newDir);
GetDirectoriesRecursInternal(newDir, ref directories);
}
}
} while (FindNextFile(hFind, out ffd));
}
//
// Gets all the directories in the specified root directory.
//
private static void GetDirectoriesInternal(string root, ref List<string> directories)
{
// Get standard set of information.
FINDEX_INFO_LEVELS findLevel = FINDEX_INFO_LEVELS.FindExInfoStandard;
// File name search.
FINDEX_SEARCH_OPS findOps = FINDEX_SEARCH_OPS.FindExSearchNameMatch;
int additionalFlags = 0;
// Check if OS version is later supported
// OS beginning from WinSvr 2008 R2 and Win 7.
if (Environment.OSVersion.Version.Major >= 6) {
// Ignore short file name to improve performance.
findLevel = FINDEX_INFO_LEVELS.FindExInfoBasic;
// Use larger buffer.
additionalFlags = FIND_FRIST_EX_LARGE_FETCH;
}
// Format path to match FindFirstFileExW pattern.
root = FormatPath(root);
if (string.IsNullOrEmpty(root)) {
return;
}
WIN32_FIND_DATA ffd;
// Try to get handle to first file system object.
IntPtr hFind = FindFirstFileExW(
root, findLevel,
out ffd, findOps,
IntPtr.Zero, additionalFlags
);
// FindFirstFileExW failed...
if (INVALID_HANDLE_VALUE == hFind) {
int err = Marshal.GetLastWin32Error();
_errors.Add("FindFirstFileExW returned Win32 error: " + err);
return;
}
// Stores end of directory name without search pattern.
// Used to create new directory name.
string end = string.Empty;
// Stores the new concatinated directory name.
string newDir = string.Empty;
do
{
// Check if handle points to a directory.
if ((ffd.dwFileAttributes & FileAttributes.Directory)
== FileAttributes.Directory)
{
// Ingore if handle points to current directory
// or top-level directory.
if (ffd.cFileName != ".." && ffd.cFileName != ".") {
// Remove wildcard from current directory.
end = root.Replace("\\*", string.Empty);
// Create new directory name.
newDir = end + Path.DirectorySeparatorChar + ffd.cFileName;
directories.Add(newDir);
}
}
} while (FindNextFile(hFind, out ffd));
}
/// <summary>
/// Gets all directories from the root directory.
/// </summary>
/// <param name="root">The root directory.</param>
/// <param name="recurse">TRUE to get all sub-directories recursivly.</param>
/// <returns></returns>
public static string[] GetDirectories(string root, bool recurse = true)
{
List<string> list = new List<string>();
if (recurse) {
// Get all sub-directories.
GetDirectoriesRecursInternal(root, ref list);
}
else {
// Top-level directories only.
GetDirectoriesInternal(root, ref list);
}
return list.ToArray();
}
}
}
Upvotes: 2
Reputation: 40868
You're making more network requests than you need. Directory.GetFiles()
makes a network request, but only returns a string. Then you're using new FileInfo(f).CreationTime
twice, but because you're creating a new FileInfo
object twice, it's making two network requests to get the same information.
You can cut this down by using DirectoryInfo.EnumerateFiles()
, which returns FileInfo
objects rather than just the file name. This way, you're getting the creation time as part of the results.
var start = DateTime.Now.AddMonths(-5);
var end = DateTime.Now;
var dir = new DirectoryInfo(sourceDir);
var filesFromToolDir = dir.EnumerateFiles("*.gz", SearchOption.AllDirectories)
.Where(f => f.CreationTime >= start
&& f.CreationTime <= end)
.ToArray();
Ideally, you want to only ask the server for the information you need, rather than requesting everything and discarding results. Unfortunately, you can't do that here.
This may only be relevant to .NET Core:
EnumerateFiles
(both in the Directory
and DirectoryInfo
classes), use native Windows NtQueryDirectoryFile
function, which you can see in the code here, and that function only gives the option to filter by file name.
In fact, the .NET code isn't great here, because it always passes null
for the FileName
parameter. So even if you ask for *.gz
it still gets every file from the server and filters it locally.
Upvotes: 3