Reputation: 109
I recently bumped into a weird functionality from Microsoft:
Let's assume our folder c:\tmp123
contains 3 files -
1.txt
2.txtx
3.txtxt
a) Invocation of Directory.GetFiles(@"C:\tmp123", "*.txt")
yields in 3 returned items.
b) Invocation of Directory.GetFiles(@"C:\tmp123", "*.txtx")
yields in 1 returned items.
According to Microsoft this is the expected behavior (see Note in MSDN).
My questions are:
Why did Microsoft decide to have such a strange functionality?
How can I overcome this problem?
i.e. how do I have a Search Pattern that would return *.txt
extension only and not return *.txtx
, *.txtstarngefunctionality
, etc.?
Upvotes: 7
Views: 1688
Reputation: 414
Here is another workaround that will help with filtering out files with extensions such as ".txtxt":
var Files = System.IO.Directory.GetFiles("*.txt").Where(item => item.Extension.ToString().ToLower() == ".txt");
Upvotes: 0
Reputation: 13350
If you want a workaround, you could simply retrieve all the file paths
var files = Directory.GetFiles(@"C:\tmp123");
and then filter them by extension as needed
var txtFiles = files.Where(f => f.EndsWith(".txt"));
var txtxFiles = files.Where(f => f.EndsWith(".txtx"));
Upvotes: 2
Reputation: 5458
The reason for this is backwards compatibility.
Windows was initially built as a graphical interface on top of MSDOS which only had files with 8 characters for the name and a maximum of 3 for the extension. Extentions to the MSDOS file systems allowed Windows to have longer file names and extensions but these would still show up as 8.3 file names in MSDOS.
Since the command prompt on Windows is an evolution of the old command interpreter in MSDOS this means some "anachronistic" behaviours (like the 3 letter search pattern) were kept so applications and scripts built in the "old days" or by "old timers" wouldn't break.
(another example is the fact most windows file systems are case insensitive, yes, you guessed, because the MSDOS one didn't have casing)
Upvotes: 2
Reputation: 63338
I'd be willing to wager it's something to do with backward compatibility. I don't see this exact issue mentioned, but this Raymond Chen blogpost mentions a number of oddities in this area:
[...] some quirks of the FCB matching algorithm persist into Win32 because they have become idiom.
For example, if your pattern ends in
.*
, the.*
is ignored. Without this rule, the pattern*.*
would match only files that contained a dot, which would break probably 90% of all the batch files on the planet, as well as everybody's muscle memory, since everybody running Windows NT 3.1 grew up in a world where*.*
meant all files.As another example, a pattern that ends in a dot doesn't actually match files which end in a dot; it matches files with no extension. And a question mark can match zero characters if it comes immediately before a dot.
Upvotes: 0