Lori
Lori

Reputation: 1

Windows / NTFS: Two files with identical long-names in the same directory?

I have been a lurker at stackoverflow.com for many years (great site and users here), but never had the need to ask a question. Now the time has come :-) Let me begin:

OS: x64 Windows 8.0 to Windows 10 (15063.14) (the issue exists since years, but I have never pursued it fully yet, so we can exclude that it is specific to a specific Windows version)

FS: NTFS

Issue: 2 files with the same (long) name in the same directory and I cannot figure out how this is even possible. This happens to me since years whenever I manually upgrade my Email client. The main .EXE file of it (MailClient.exe) is never asking for replacement if copying the new one over to the same directory. Instead they are both placed there, with the exact same long name.

The issue has nothing to do with a specific directory, I can copy around both .EXE files to freshly created directories on the NTFS drive without issues (also getting no "overwrite" question there).

Let me show you:

C:\temp\2>dir
 Volume in drive C is SSD 840 Pro
 Volume Serial Number is 0C6D-D489

 Directory of C:\temp\2

13.04.2017  02:29    <DIR>          .
13.04.2017  02:29    <DIR>          ..
21.10.2016  17:10        24.742.760 MailClient.exe
27.12.2016  03:26        24.911.872 MailCliеnt.exe
               2 File(s)     49.654.632 bytes
               2 Dir(s)  78.503.038.976 bytes free

However, if doing a dir /x, this comes up:

C:\temp\2>dir /x
 Volume in drive C is SSD 840 Pro
 Volume Serial Number is 0C6D-D489

 Directory of C:\temp\2

13.04.2017  02:29    <DIR>                       .
13.04.2017  02:29    <DIR>                       ..
21.10.2016  17:10        24.742.760 MAILCL~2.EXE MailClient.exe
27.12.2016  03:26        24.911.872 MAILCL~1.EXE MailCliеnt.exe
               2 File(s)     49.654.632 bytes
               2 Dir(s)  78.503.038.976 bytes free

So they obviously have a different 8.3 name, OK, but the exact same long name. Here is another screenshot of the situation. Both files show the same location within the Windows "properties" dialog (right click) too. Unfortunately I am not allowed to post images just yet (it seems) - just tried. So you will have to take my word.

I cannot figure out how this is possible and this is bugging me ;) As soon as I rename both files for example to 1.exe, Windows starts telling me that there is already a file with that name in the same directory. So it obviously has something to do with the filename, but they are both exactly identical, no extra spaces, nothing, as you can see from the DIR command.

I´ve also tried to rename them and re-wrote the exact wording "MailCient.exe" manually for both, to make sure the characters are EXCACTLY the same, Windows still won´t complain, they both go there once again under the same name. However, renaming them to "Mail.exe" and "Mail.exe" will NOT work, then Windows is saying that another file with that name already exists. However, naming them both back to "MailClient.exe" is just absolutely fine, no complains by Windows with that.

Another fun fact about this, if I dir for mailclient.exe directly, this happens:

C:\temp\2>dir mailclient.exe
 Volume in drive C is SSD 840 Pro
 Volume Serial Number is 0C6D-D489

 Directory of C:\temp\2

21.10.2016  17:10        24.742.760 MailClient.exe
               1 File(s)     24.742.760 bytes
               0 Dir(s)  78.501.998.592 bytes free

However, if looking for *.exe, this happens:

C:\temp\2>dir *.exe
 Volume in drive C is SSD 840 Pro
 Volume Serial Number is 0C6D-D489

 Directory of C:\temp\2

21.10.2016  17:10        24.742.760 MailClient.exe
27.12.2016  03:26        24.911.872 MailCliеnt.exe
               2 File(s)     49.654.632 bytes
               0 Dir(s)  78.501.990.400 bytes free

This yields also interesting results:

C:\temp\2>ren mailclient.exe *.bak

C:\temp\2>dir
 Volume in drive C is SSD 840 Pro
 Volume Serial Number is 0C6D-D489

 Directory of C:\temp\2

13.04.2017  02:50    <DIR>          .
13.04.2017  02:50    <DIR>          ..
21.10.2016  17:10        24.742.760 MailClient.bak
27.12.2016  03:26        24.911.872 MailCliеnt.exe
               2 File(s)     49.654.632 bytes
               2 Dir(s)  78.501.990.400 bytes free

And back:

C:\temp\2>ren mailclient.bak MailClient.exe

C:\temp\2>dir
 Volume in drive C is SSD 840 Pro
 Volume Serial Number is 0C6D-D489

 Directory of C:\temp\2

13.04.2017  02:51    <DIR>          .
13.04.2017  02:51    <DIR>          ..
21.10.2016  17:10        24.742.760 MailClient.exe
27.12.2016  03:26        24.911.872 MailCliеnt.exe
               2 File(s)     49.654.632 bytes
               2 Dir(s)  78.501.982.208 bytes free

I´ve also checked permissions on the files and took ownership, it changes nothing. Additionally I´ve cleared the NTFS Journal and even the transaction log + run chkdsk, which reveals no errors either.

Any ideas on this mysterious situation? What am I missing?

Thanks so much:)

UPDATE #1:

I´ve just tried this: going to Windows explorer and renaming both files after each other by truncating their names. So I first renamed the first "MailClient.exe" to "MailClien.exe", then the seconds "MailClient.exe" to "MailClien.exe". Again, no message by Windows that they have the same name, it just renamed both fine. I then continued to "MailClie.exe". Worked. However, as soon as I tried to renamed both to "MailCli.exe", Windows complained and told me that there is already another file with that name. Trying to rename both back from there to "MailClient.exe" also does not work, just for one of them, because then Windows says (and right so too) that a file with that name already exists. So it seems to come down to the "e" possibly having another ANSI-character in both filenames? I, however, wouldn´t know of another one for "e", or am I missing something?

Upvotes: 0

Views: 2090

Answers (1)

JosefZ
JosefZ

Reputation: 30113

Harry Johnston is right: one of the filenames contains a Unicode character that just looks the same as an ANSI character.

Read Naming Files, Paths, and Namespaces:

On newer file systems, such as NTFS, exFAT, UDFS, and FAT32, Windows stores the long file names on disk in Unicode, which means that the original long file name is always preserved. This is true even if a long file name contains extended characters, regardless of the code page that is active during a disk read or write operation.

Use the following PowerShell script 43381802b.ps1 to detect and show non-ANSI file names (see different calls below):

param( [string[]]$Path = '.',
       [switch]$Cpp,  ### list any non-ANSI character in file names like a C++ literal
                      ### i.e. a prefix \u followed by a four digit Unicode code point
       [switch]$All   ### list all files including pure ANSI-encoded file names
      )
Set-StrictMode -Version latest
$strArr = Get-ChildItem -path $Path
$arrDiff = @()
for ($i=0; $i -lt $strArr.Count; $i++) {
    $strDiff = 'ANSI'
    $strName = ''
    $auxName = $strArr[$i].Name
    for (  $k=0; $k -lt $auxName.Length; $k++ ) {
        if ( [int][char]$auxName[$k] -gt 255 ) {
            $strDiff  = 'UCS2'
            $strName += '\u{0:X4}' -f [int][char]$auxName[$k]
        } else { 
            $strName += $auxName[$k]
        }
    }
    if ( $All.IsPresent -or $strDiff -eq 'UCS2' ) { 
        $strArr[$i] | Add-Member NoteProperty Code    $strDiff
        $strArr[$i] | Add-Member NoteProperty CppName $strName
        $arrDiff += $strArr[$i]
    }
}
if ( $Cpp.IsPresent ) {
    $arrDiff | Select-Object -Property Code, Mode, LastWriteTime, Length, CppName | ft
} else {
    $arrDiff | Select-Object -Property Code, Mode, LastWriteTime, Length, Name | ft
}

Output:

PS D:\PShell> .\SO\43381802b.ps1 'C:\testC\43381802'

Code Mode   LastWriteTime       Length Name
---- ----   -------------       ------ ----
UCS2 -a---- 02/05/2017 11:47:53    317 MailCliеnt.txt
UCS2 -a---- 02/05/2017 11:49:04    317 МailClient.txt
UCS2 -a---- 02/05/2017 11:50:16    399 МailCliеnt.txt

PS D:\PShell> .\SO\43381802b.ps1 'C:\testC\43381802' -Cpp

Code Mode   LastWriteTime       Length CppName
---- ----   -------------       ------ -------
UCS2 -a---- 02/05/2017 11:47:53    317 MailCli\u0435nt.txt
UCS2 -a---- 02/05/2017 11:49:04    317 \u041CailClient.txt
UCS2 -a---- 02/05/2017 11:50:16    399 \u041CailCli\u0435nt.txt


PS D:\PShell> .\SO\43381802b.ps1 'C:\testC\43381802' -Cpp -All

Code Mode   LastWriteTime       Length CppName
---- ----   -------------       ------ -------
ANSI -a---- 02/05/2017 11:44:05    235 MailClient.txt
UCS2 -a---- 02/05/2017 11:47:53    317 MailCli\u0435nt.txt
UCS2 -a---- 02/05/2017 11:49:04    317 \u041CailClient.txt
UCS2 -a---- 02/05/2017 11:50:16    399 \u041CailCli\u0435nt.txt

Use the following 43381802a.ps1 script to get more info about non-ANSI characters (see the first call bellow) and their position in file names (see the latter call bellow with -Detail switch):

param(  [string[]] $strArr = @('ΗGreek', 'НCyril', 'HLatin'),
        [switch]$Detail )
Set-StrictMode -Version latest
$auxArr = @()
if ( ( Get-Command -Name Get-CharInfo -ErrorAction SilentlyContinue ) -and 
     ( -not $Detail.IsPresent ) ) {
    $auxArr = $strArr | Get-CharInfo | 
        Where-Object { [int]$_.Codepoint.Replace('U+', '0x') -ge 128 }
} else {
    foreach ($strStr in $strArr) {
        for ($i = 0; $i -lt $strStr.Length; $i++ ) {
            if ( [int][char]$strStr[$i] -ge  128 ) {
                $auxArr += [PSCustomObject] @{
                    Char        = $strStr[$i]
                    CodePoint   = 'U+{0:x4}' -f [int][char]$strStr[$i]
                    Category    = $i + 1                   ### 1-based index
                    Description = $strStr                  ### string itself
                }
            }
        }
    }
}
$auxArr

Output:

PS D:\PShell> .\SO\43381802a.ps1 ( Get-childitem -path 'C:\testC\43381802' ).Name

Char CodePoint        Category Description
---- ---------        -------- -----------
   е U+0435    LowercaseLetter Cyrillic Small Letter Ie
   М U+041C    UppercaseLetter Cyrillic Capital Letter Em
   М U+041C    UppercaseLetter Cyrillic Capital Letter Em
   е U+0435    LowercaseLetter Cyrillic Small Letter Ie


PS D:\PShell> .\SO\43381802a.ps1 ( Get-childitem -path 'C:\testC\43381802' ).Name -detail

Char CodePoint Category Description
---- --------- -------- -----------
   е U+0435           8 MailCliеnt.txt
   М U+041c           1 МailClient.txt
   М U+041c           1 МailCliеnt.txt
   е U+0435           8 МailCliеnt.txt

Tested on files:

==> dir /-C /X /A-D C:\testC\43381802\
 Volume in drive C has no label.
 Volume Serial Number is …

 Directory of C:\testC\43381802

02/05/2017  11:44               235 MAILCL~1.TXT MailClient.txt
02/05/2017  11:47               317 MAILCL~2.TXT MailCliеnt.txt
02/05/2017  11:49               317 AILCLI~1.TXT МailClient.txt
02/05/2017  11:50               399 AILCLI~2.TXT МailCliеnt.txt
               4 File(s)           1268 bytes
               0 Dir(s)     69914857472 bytes free

==> 

Upvotes: 1

Related Questions