Reputation: 11
I am trying to split a text file based on several strings into two files using Powershell. The file sizes rage from 5KB-15KB.
The file data is formatted for example below:
18600 - ABCD 2204 2020-04-11 00:00:00
18600 - ABCD 2204 2020-04-11 00:00:00
18600 - ABCD 2204 2020-04-11 00:00:00
18113 - ABCD 2204 2020-04-11 00:00:00
18113 - ABCD 2204 2020-04-11 00:00:00
19873 - ABCD 2204 2020-04-11 00:00:00
18764 - ABCD 2204 2020-04-11 00:00:00
19000 - ABCD 2204 2020-04-11 00:00:00
I need to split all rows that begin with 18600, 18113, 19000, etc. (or any set of specified 5 digits) into one file and all remaining lines of data that do not begin with those numbers (else) into a second file.
So the logic is, For each line in the file if it begins with these sets of specified numbers, write to "file1" else write it to "file2".
$file = (Get-Content myfile.txt)
ForEach ($line in $file) {
If ($line -match a set of strings)
{
$newfile = all lines with set of beginning strings
}
Else {
$line | Out-File -Append different file
}
}
I'm open to any other other suggestions outside of powershell also. Thank you so much for your help.
Upvotes: 0
Views: 403
Reputation: 7489
presuming that you want all the lines that start with a number in the 18000..18999 range, this does the job ... [grin]
what it does ...
#region/#endregion
block with a call to Get-Content
.[int]
18
filethis code ...
the code ...
$SourceDir = "$env:TEMP\WBCha"
$TargetNumberRange = 18000..18999
$InFile = Join-Path -Path $SourceDir -ChildPath 'InFile.txt'
$18OutFile = Join-Path -Path $SourceDir -ChildPath '18_OutFile.txt'
$Not_18OutFile = Join-Path -Path $SourceDir -ChildPath 'Not_18OutFile.txt'
#region >>> create a file to work with
# when ready to do this for real, replace the whole "region" block with a Get-Contnet call
if (-not (Test-Path -LiteralPath $SourceDir))
{
$Null = New-Item -Path $SourceDir -ItemType 'Directory' -ErrorAction 'SilentlyContinue'
}
$HowManyLines = 1e1
$Content = foreach ($Line in 0..$HowManyLines)
{
$Prefix = @(18,19)[(Get-Random -InputObject @(0, 1))]
'{0}{1:d3} - {2}' -f $Prefix, $Line, [datetime]::Now.ToString('yyyyy-MM-dd HH:mm:ss:ffff')
}
$Content |
Set-Content -LiteralPath $InFile -ErrorAction 'SilentlyContinue'
#endregion >>> create a file to work with
foreach ($IF_Item in (Get-Content -LiteralPath $InFile))
{
if ([int]$IF_Item.Split(' ')[0] -in $TargetNumberRange)
{
Add-Content -LiteralPath $18OutFile -Value $IF_Item
}
else
{
Add-Content -LiteralPath $Not_18OutFile -Value $IF_Item
}
}
the 18
file content ...
18000 - 02020-07-10 12:29:45:6736
18001 - 02020-07-10 12:29:45:6736
18004 - 02020-07-10 12:29:45:6746
18005 - 02020-07-10 12:29:45:6756
18006 - 02020-07-10 12:29:45:6756
18008 - 02020-07-10 12:29:45:6766
18010 - 02020-07-10 12:29:45:6766
the not 18
file content ...
19002 - 02020-07-10 12:29:45:6746
19003 - 02020-07-10 12:29:45:6746
19007 - 02020-07-10 12:29:45:6756
19009 - 02020-07-10 12:29:45:6766
Upvotes: 1
Reputation: 2495
Assuming that you want to separate the rows that start with numbers to one file, and the ones not starting with numbers to other file, you can use -match
operator and pass a Regex to scan all the rows in your text file and separate the ones starting with digits.
The code snippet goes something like this:
$processText = $fileData.Split([Environment]::NewLine,[StringSplitOptions]::RemoveEmptyEntries)
{
if($row -match "\d") #Regex to check whether the first character of $row is a digit
{
$row | Out-File -FilePath "D:\DataStartingWithNum.text"
}
else
{
$row | Out-File -FilePath "D:\DataStartingWithText.text"
}
}
If you have any other condition as well (which you might have missed explaining in your question above), you can use similar way to filter out any pattern of initial data using suitable Regex with -match
operator.
Hope this helps.
Upvotes: 0