WBCha
WBCha

Reputation: 11

Split Text file using Powershell

I am trying to split a text file based on several strings into two files using Powershell. The file sizes rage from 5KB-15KB.

The file data is formatted for example below:

18600 - ABCD 2204 2020-04-11 00:00:00

18600 - ABCD 2204 2020-04-11 00:00:00

18600 - ABCD 2204 2020-04-11 00:00:00

18113 - ABCD 2204 2020-04-11 00:00:00

18113 - ABCD 2204 2020-04-11 00:00:00

19873 - ABCD 2204 2020-04-11 00:00:00

18764 - ABCD 2204 2020-04-11 00:00:00

19000 - ABCD 2204 2020-04-11 00:00:00

I need to split all rows that begin with 18600, 18113, 19000, etc. (or any set of specified 5 digits) into one file and all remaining lines of data that do not begin with those numbers (else) into a second file.

So the logic is, For each line in the file if it begins with these sets of specified numbers, write to "file1" else write it to "file2".

$file = (Get-Content myfile.txt)
ForEach ($line in $file) {
  If ($line -match a set of strings) 
{
$newfile = all lines with set of beginning strings
}
Else {
$line | Out-File -Append different file
}    
}

I'm open to any other other suggestions outside of powershell also. Thank you so much for your help.

Upvotes: 0

Views: 403

Answers (2)

Lee_Dailey
Lee_Dailey

Reputation: 7489

presuming that you want all the lines that start with a number in the 18000..18999 range, this does the job ... [grin]

what it does ...

  • set the constants
  • creates a file to work with
    when ready to do this with your data, replace the entire #region/#endregion block with a call to Get-Content.
  • loads the input file
  • iterates thru that collection
  • splits the current line to get the part before the 1st space
  • converts that to an [int]
  • checks to see if it is in the desired range
  • if YES, sends it to the 18 file
  • if NO, sends it to the not-18 file

this code ...

  • lacks any significant error handling
  • does not keep track of what was done
  • does not show what is going on

the code ...

$SourceDir = "$env:TEMP\WBCha"
$TargetNumberRange = 18000..18999
$InFile = Join-Path -Path $SourceDir -ChildPath 'InFile.txt'
$18OutFile = Join-Path -Path $SourceDir -ChildPath '18_OutFile.txt'
$Not_18OutFile = Join-Path -Path $SourceDir -ChildPath 'Not_18OutFile.txt'

#region >>> create a file to work with
#    when ready to do this for real, replace the whole "region" block with a Get-Contnet call
if (-not (Test-Path -LiteralPath $SourceDir))
    {
    $Null = New-Item -Path $SourceDir -ItemType 'Directory' -ErrorAction 'SilentlyContinue'
    }
$HowManyLines = 1e1
$Content = foreach ($Line in 0..$HowManyLines)
    {
    $Prefix = @(18,19)[(Get-Random -InputObject @(0, 1))]
    '{0}{1:d3} - {2}' -f $Prefix, $Line, [datetime]::Now.ToString('yyyyy-MM-dd HH:mm:ss:ffff')
    }
$Content |
    Set-Content -LiteralPath $InFile -ErrorAction 'SilentlyContinue'
#endregion >>> create a file to work with


foreach ($IF_Item in (Get-Content -LiteralPath $InFile))
    {
    if ([int]$IF_Item.Split(' ')[0] -in $TargetNumberRange)
        {
        Add-Content -LiteralPath $18OutFile -Value $IF_Item
        }
        else
        {
        Add-Content -LiteralPath $Not_18OutFile -Value $IF_Item
        }
    }

the 18 file content ...

18000 - 02020-07-10 12:29:45:6736
18001 - 02020-07-10 12:29:45:6736
18004 - 02020-07-10 12:29:45:6746
18005 - 02020-07-10 12:29:45:6756
18006 - 02020-07-10 12:29:45:6756
18008 - 02020-07-10 12:29:45:6766
18010 - 02020-07-10 12:29:45:6766

the not 18 file content ...

19002 - 02020-07-10 12:29:45:6746
19003 - 02020-07-10 12:29:45:6746
19007 - 02020-07-10 12:29:45:6756
19009 - 02020-07-10 12:29:45:6766

Upvotes: 1

Yash Gupta
Yash Gupta

Reputation: 2495

Assuming that you want to separate the rows that start with numbers to one file, and the ones not starting with numbers to other file, you can use -match operator and pass a Regex to scan all the rows in your text file and separate the ones starting with digits.

The code snippet goes something like this:

$processText = $fileData.Split([Environment]::NewLine,[StringSplitOptions]::RemoveEmptyEntries)
{
     if($row -match "\d") #Regex to check whether the first character of $row is a digit
     {
         $row | Out-File -FilePath "D:\DataStartingWithNum.text"
     }
     else
     {
         $row | Out-File -FilePath "D:\DataStartingWithText.text"
     }
}

If you have any other condition as well (which you might have missed explaining in your question above), you can use similar way to filter out any pattern of initial data using suitable Regex with -match operator.

Hope this helps.

Upvotes: 0

Related Questions