Reputation: 112
I have a collection of text files, each one containing a few thousand numbers on separate lines. I would like to use Command line or Powershell to output a file summarizing the data contained in each file, such as below, e.g. Counts.txt:
test.txt: <--Filename
(a) total: 4325 <-- Total number of lines in the file
(b) isbn: 2 <-- Count of numbers that don't start with 3618*
(c) duplicates: 13 <-- (a-b) - (Count of unique numbers that start with 3618*)
The above would be appended to Counts.txt for each file in the directory.
So far I have managed to get the total lines per file to Counts.txt using:
@echo off
setlocal enabledelayedexpansion
set cnt=0
set "out=Counts.txt"
if exist %out% del /q %out%
for /f %%a in ('dir /b /a-d') do (
for /f %%b in ('type "%%a"^|find /v /c ""') do (
set /a cnt=%%b & >>%out% echo(%%~nxa: "-total: %%b")
)
)
Which outputs:
test.txt: -total: 9
How do I get the Count of numbers that don't start with 3618* and the calculation for point (c)?
Some help on formatting would also be appreciated :)
Upvotes: 2
Views: 397
Reputation: 112
Thanks to Theo's answer above, I managed to get the output I wanted using the below:
$result = Get-ChildItem -Path 'C:\Users\stefa\Desktop\test' -Filter '*.txt' -File | ForEach-Object {
$data = Get-Content -Path $_.FullName
$isbn = @($data | Where-Object { $_ -notlike '3618*' }).Count
$unique = @($data | Where-Object { $_ -like '3618*' } | sort-object | Get-Unique).Count
$dupes =($data.Count)-($isbn)-($unique)
# output as PsObject
[PsCustomObject]@{
File = $_.Name
Isbn = $isbn
Duplicates = $dupes
Unique = $unique
Total = $data.Count
}
}
$result | Export-Csv -Path '.\Desktop\test\Counts.csv' -UseCulture -NoTypeInformation
Upvotes: 1
Reputation: 61123
It is a bit unclear what exactly you mean with the Count of unique duplicate numbers that start with 3618*, so below I have added two possible options for that, with one commented out. You can chose which count you need..
$result = Get-ChildItem -Path 'D:\Test' -Filter '*.txt' -File | ForEach-Object {
$data = Get-Content -Path $_.FullName
$isbn = @($data | Where-Object { $_ -like '97*' }).Count
# if only numbers starting with 3618 that actually have exact duplicates (like 3618123 found multiple times), do this:
$dupes = @($data | Where-Object { $_ -like '3618*' } | Group-Object | Where-Object {$_.Count -gt 1}).Count
# if ALL numbers starting with 3618 are to be regarded as duplicates, use this instead:
# $dupes = @($data | Where-Object { $_ -like '3618*' }).Count
# output the data in the format you showed in the question
@"
$($_.Name)
a) total: $($data.Count)
b) isbn: $isbn
c) duplicates: $dupes
"@
}
Next, write the results to file
$result | Set-Content -Path '.\Counts.txt'
Result something like this:
numbers1.txt
a) total: 10
b) isbn: 2
c) duplicates: 1
numbers2.txt
a) total: 9
b) isbn: 2
c) duplicates: 0
Personally however, I would want the output as CSV file:
$result = Get-ChildItem -Path 'D:\Test' -Filter '*.txt' -File | ForEach-Object {
$data = Get-Content -Path $_.FullName
$isbn = @($data | Where-Object { $_ -like '97*' }).Count
# if only numbers starting with 3618 that actually have exact duplicates (like 3618123 found multiple times), do this:
$dupes = @($data | Where-Object { $_ -like '3618*' } | Group-Object | Where-Object {$_.Count -gt 1}).Count
# if ALL numbers starting with 3618 are to be regarded as duplicates, use this instead:
# $dupes = @($data | Where-Object { $_ -like '3618*' }).Count
# output as PsObject
[PsCustomObject]@{
File = $_.Name
Total = $data.Count
Isbn = $isbn
Duplicates = $dupes
}
}
$result | Export-Csv -Path '.\Counts.csv' -UseCulture -NoTypeInformation
Upvotes: 2