Reputation: 380
I receive a collection of files to process monthly from a vendor. The files do not have an extension, but they have a consistent naming convention. However, the lack of an extension causes some issues when sometimes there is a compressed folder in there with the name of a file I have the code to process. So, I'm looking for some way of doing a boolean check on each file to confirm if it is actually a compressed folder. The twist is that Excel files are able to be opened like a compressed folder (and have docProps, xl, _rels in it.)
I've been unsuccessfully trying get-childitem and Get-Content. Is there a way to do a "test" on a file that returns true if it is actually a zip file?
Upvotes: 4
Views: 7450
Reputation: 914
Following on the answer above, I ran into some issues with the script posted due to the '!' char's in the If statements, here's what worked for me,
My usecase was recovering DOCX and XLSX files after a friend's hard drive got corrupted due to a power loss and they had a bunch of CHK files, which 'unchk' utility renamed to Zip files.
Note the temp save directory mentioned here - this process will need to extract your zip's to read the contents, so this code expects a folder of zips to run thru, then extracts each one into its own folder in a seperate directory for temp files.
$FolderPath = "C:\FOUND.001\"
Get-ChildItem $FolderPath -Filter *.zip |
Foreach-Object {
#Write-Output $_.FullName
#If ($_.FullName -eq "C:\FOUND.001\FILE3640.zip"){ TEST LINE, REMOVED AFTER SUCCESSFULLY WORKING ON 1 FILE OF THOUSANDS...
$FullFileName = $_.FullName
Write-Output $FullFileName
$ZipName = $_.BaseName
$TempSaveDirectory = "C:\RecoveryResults\$ZipName"
Write-Output "Creating $TempSaveDirectory"
#Make a directory to hold zip contents
New-Item $TempSaveDirectory -ItemType directory
$ErrorStatus=Start-Process -FilePath "C:\Program Files\7-Zip\7z.exe" -ArgumentList " x $FullFileName -o$TempSaveDirectory" -passthru -Wait
If (($ErrorStatus.ExitCode -eq 0) -or ($ErrorStatus.ExitCode -eq $null))
{
$FileContents=get-childitem -path $TempSaveDirectory
#_rels is a folder in a Word or Excel file when you open it with 7-zip. If it is able to be opened with 7-zip and
# doesn't have that folder, consider the file to be a zip file.
if ($FileContents -like "*rels*")
{
write-output "Found rels folder"
if ($FileContents -like "*xl*")
{
Rename-Item $FullFileName "$FullFileName.xlsx"
Write-Output "Renamed $FullFileName To Excel"
}
elseif ($FileContents -like "*word*")
{
Rename-Item $FullFileName "$FullFileName.docx"
Write-Output "Renamed $FullFileName To Word Doc"
}
}
else
{
If (!($FullFileName.ToLower().EndsWith(".zip")))
{
}
}
}
}
#}
Upvotes: 0
Reputation: 380
I ended up doing something like this, hopefully someone finds it useful:
$ErrorStatus=Start-Process -FilePath $env:7ZipExe -ArgumentList " x $FullFileName -o$env:ProcessingFolder" -passthru -Wait
If (($ErrorStatus.ExitCode -eq 0) -or ($ErrorStatus.ExitCode -eq $null))
{
$FileContents=get-childitem -path $FullFileName
#_rels is a folder in a Word or Excel file when you open it with 7-zip. If it is able to be opened with 7-zip and
# doesn't have that folder, consider the file to be a zip file.
if (!($FileContents -like "*rels*"))
{
if (!($FileContents -like "*xl*") -and (!($FullFileName.ToLower().EndsWith('.xlsx'))))
{
Rename-Item $FullFileName "$FullFileName.xlsx"
}
elseif (!($FileContents -like "*word*") -and (!($FullFileName.toLower().EndsWith('.docx')))-and (!($FullFileName.toLower().EndsWith('.xlsx'))))
{
Rename-Item $FullFileName "$FullFileName.docx"
}
}
else
{
If (!($FileName.ToLower().EndsWith(".zip")))
{
Rename-Item $FullFileName "$FullFileName.zip"
Add-Content $env:LogReportFile "$(Get-Date) - $FileName was a zip file. Added extension."
}
}
}
Upvotes: 1
Reputation: 28154
The Carbon Powershell module includes a cmdlet Test-ZipFile
which will tell you if it's a zipfile or not.
If you can't use that module, you can look at the file header. This is a little ugly (short on time) but works:
$contents = [string](get-content -raw -Encoding Unknown -path $filepath).ToCharArray();
[convert]::tostring([convert]::toint32($contents[0]),16);
The output is 4b50
for a file which is known to be a ZIP file, which matches the first two bytes of the signature, reversed.
Longer term, make the vendor fix their system to provide more information about the files. Especially if they're the type you want.
If you need to distinguish between Excel (2007+ ) and true ZIP files, without having an extension, you're stuck - as you already know, you can just rename the .xlsx file to .zip and it'll open like any other ZIP file - there's nothing to distinguish.
Upvotes: 4