Cody
Cody

Reputation: 112

Count and filter text files with Powershell

I have a collection of text files, each one containing a few thousand numbers on separate lines. I would like to use Command line or Powershell to output a file summarizing the data contained in each file, such as below, e.g. Counts.txt:

test.txt:          <--Filename
 (a) total: 4325    <-- Total number of lines in the file
 (b) isbn: 2        <-- Count of numbers that don't start with 3618*
 (c) duplicates: 13 <-- (a-b) - (Count of unique numbers that start with 3618*)

The above would be appended to Counts.txt for each file in the directory.

So far I have managed to get the total lines per file to Counts.txt using:

@echo off
setlocal enabledelayedexpansion

set cnt=0
set "out=Counts.txt"
if exist %out% del /q %out%
for /f %%a in ('dir /b /a-d') do (
    for /f %%b in ('type "%%a"^|find /v /c ""') do (
      set /a cnt=%%b & >>%out% echo(%%~nxa: "-total: %%b") 
    )
)

Which outputs:

test.txt: -total: 9

How do I get the Count of numbers that don't start with 3618* and the calculation for point (c)?

Some help on formatting would also be appreciated :)

Upvotes: 2

Views: 397

Answers (2)

Cody
Cody

Reputation: 112

Thanks to Theo's answer above, I managed to get the output I wanted using the below:

$result = Get-ChildItem -Path 'C:\Users\stefa\Desktop\test' -Filter '*.txt' -File | ForEach-Object {
    $data = Get-Content -Path $_.FullName
    $isbn = @($data | Where-Object { $_ -notlike '3618*' }).Count
    $unique = @($data | Where-Object { $_ -like '3618*' } | sort-object | Get-Unique).Count

    $dupes =($data.Count)-($isbn)-($unique)

    # output as PsObject
    [PsCustomObject]@{
        File       = $_.Name
        Isbn       = $isbn
        Duplicates = $dupes
        Unique = $unique
        Total      = $data.Count
    }
}

$result | Export-Csv -Path '.\Desktop\test\Counts.csv' -UseCulture -NoTypeInformation

Upvotes: 1

Theo
Theo

Reputation: 61123

It is a bit unclear what exactly you mean with the Count of unique duplicate numbers that start with 3618*, so below I have added two possible options for that, with one commented out. You can chose which count you need..

$result = Get-ChildItem -Path 'D:\Test' -Filter '*.txt' -File | ForEach-Object {
    $data = Get-Content -Path $_.FullName
    $isbn = @($data | Where-Object { $_ -like '97*' }).Count

    # if only numbers starting with 3618 that actually have exact duplicates (like 3618123 found multiple times), do this:
    $dupes = @($data | Where-Object { $_ -like '3618*' } | Group-Object | Where-Object {$_.Count -gt 1}).Count

    # if ALL numbers starting with 3618 are to be regarded as duplicates, use this instead:
    # $dupes = @($data | Where-Object { $_ -like '3618*' }).Count

    # output the data in the format you showed in the question
    @"
$($_.Name)
 a) total: $($data.Count)
 b) isbn: $isbn
 c) duplicates: $dupes

"@
}

Next, write the results to file

$result | Set-Content -Path '.\Counts.txt'

Result something like this:

numbers1.txt
 a) total: 10
 b) isbn: 2
 c) duplicates: 1

numbers2.txt
 a) total: 9
 b) isbn: 2
 c) duplicates: 0

Personally however, I would want the output as CSV file:

$result = Get-ChildItem -Path 'D:\Test' -Filter '*.txt' -File | ForEach-Object {
    $data = Get-Content -Path $_.FullName
    $isbn = @($data | Where-Object { $_ -like '97*' }).Count

    # if only numbers starting with 3618 that actually have exact duplicates (like 3618123 found multiple times), do this:
    $dupes = @($data | Where-Object { $_ -like '3618*' } | Group-Object | Where-Object {$_.Count -gt 1}).Count

    # if ALL numbers starting with 3618 are to be regarded as duplicates, use this instead:
    # $dupes = @($data | Where-Object { $_ -like '3618*' }).Count

    # output as PsObject
    [PsCustomObject]@{
        File       = $_.Name
        Total      = $data.Count
        Isbn       = $isbn
        Duplicates = $dupes
    }
}

$result | Export-Csv -Path '.\Counts.csv' -UseCulture -NoTypeInformation

Upvotes: 2

Related Questions