Bandit
Bandit

Reputation: 523

Powershell searching for string in file and compare with other file in order to find duplication

The script below search for string 'Package ID=' in files that located in VIP and ZIP files. Each VIP contains only one vip.manifest that have at least one guid for Package ID ZIP file contains a VIP file. As you can see the content extracted to temp folder and deleted at the end. Now my path contains many VIPs or ZIPs and I need to know if there duplication. if more than one manifest hold the same guid and display the information in which files the duplication. When I run it I can see all the guids from all the ZIPs/VIPs in the path

function checkpackageID([string]$_path)
{
Add-Type -AssemblyName System.IO.Compression, System.IO.Compression.FileSystem

$path = $_path
$tempFolder = Join-Path ([IO.Path]::GetTempPath()) (New-GUID).ToString('n')
$compressedfiles = Get-ChildItem -path $path\* -Include "*.vip","*.zip"

foreach ($file in $compressedfiles) 
{   
    if ($file -like "*.zip")
    {
     try 
     { 
        $zip = [System.IO.Compression.ZipFile]::ExtractToDirectory($file, $tempFolder)
        $test = Get-ChildItem -path $tempFolder\* -Include "*.vip" 
       
        if ($test)
        {
            $zip2 = [System.IO.Compression.ZipFile]::ExtractToDirectory($test, $tempFolder)
            $guidmaps = Get-ChildItem $tempFolder -Include "*.manifest" -Recurse
            write-host    
            foreach($guidmap in $guidmaps) 
            {
               switch -Regex -File($guidmap) { 
               '(?<=<Package ID=")(?<guid>[\d\w-]+)"' {
               [pscustomobject]@{
               Guid = $Matches['guid']
               Path = $guidmap.FullName
            }
        }
    }
}
            $guidmap = $guidmap | Group-Object Guid | Where-Object Count -GT 1 | ForEach-Object Group
              
            }

        $guidmap
     }
     catch 
     {
            Write-Warning $_.Exception.Message
            continue
     }
     finally 
     {
               Remove-Item $tempFolder -Force -Recurse
     }
    }
    elseif ($file -like "*.vip") #vip
    {
     try 
     { 
        $zip = [System.IO.Compression.ZipFile]::ExtractToDirectory($file, $tempFolder)
        $guidmaps = Get-ChildItem $tempFolder -Include "*.manifest" -Recurse
        write-host
        foreach($guidmap in $guidmaps) 
        {            
            switch -Regex -File($guidmap) { 
               '(?<=<Package ID=")(?<guid>[\d\w-]+)"' {
               [pscustomobject]@{
               Guid = $Matches['guid']
               Path = $guidmap.FullName
            }
        }
    }
}
        $guidmap = $guidmap | Group-Object Guid | Where-Object Count -GT 1 | ForEach-Object Group
        $guidmap  
     }
        
     catch 
     {
            Write-Warning $_.Exception.Message
            continue
     }
     finally 
     {
               Remove-Item $tempFolder -Force -Recurse
     }  
    }
     
    }

} 

Upvotes: 0

Views: 110

Answers (1)

Santiago Squarzon
Santiago Squarzon

Reputation: 60145

Instead of extracting all .manifest files to a folder from your .zip and .vip, you can read the entries directly in memory. Assuming there could be .vip files contained in the .zip, one approach would be to use a recursive function that will search for all the .manifest files. Once all GUIDs have been extracted using the function, the logic using Group-Object would remain the same.

using namespace System.IO
using namespace System.IO.Compression

Add-Type -AssemblyName System.IO.Compression

function Get-ManifestFile {
    [cmdletbinding()]
    param(
        [parameter(ValueFromPipeline, Mandatory)]
        [object] $Path,
        [string] $TargetExtension = '.manifest',
        [string] $Pattern = '(?<=<Package ID=")(?<guid>[\d\w-]+)"',
        [Parameter(DontShow)]
        [string] $Parent
    )

    process {

        try {
            if($Path -isnot [FileInfo]) {
                $zip = [ZipArchive]::new($Path.Open())
                $filePath = $Parent
            }
            else {
                $zip = [ZipFile]::OpenRead($Path.FullName)
                $filePath = $Path.FullName
            }

            foreach($entry in $zip.Entries) {
                # if the entry is a `manifest` file, read it
                if([Path]::GetExtension($entry) -eq $TargetExtension) {
                    try {
                        $handle = $entry.Open()
                        $reader = [StreamReader]::new($handle)
                        while(-not $reader.EndOfStream) {
                            if($reader.ReadLine() -match $Pattern) {
                                [pscustomobject]@{
                                    Guid         = $Matches['guid']
                                    FilePath     = $filePath
                                    ZipEntryPath = $entry.FullName
                                }
                            }
                        }
                    }
                    catch { $PSCmdlet.WriteError($_) }
                    finally {
                        ($reader, $handle).ForEach('Dispose')
                    }
                }
                # if the entry is a `.vip` file use recursion
                if([Path]::GetExtension($entry) -eq '.vip') {
                    Get-ManifestFile -Path $entry -Parent $filePath
                }
            }
        }
        catch { $PSCmdlet.WriteError($_) }
        finally {
            ($path, $zip).ForEach('Dispose')
        }
    }
}

$path = "Define Path Here!!!"
$result = Get-ChildItem $path\* -Include '*.vip', '*.zip' |
    Get-ManifestFile | Group-Object Guid | Where-Object Count -GT 1 |
        ForEach-Object Group

if(-not $result) {
    'No duplicates found.'
}
else { $result }

Upvotes: 1

Related Questions