user176047
user176047

Reputation: 371

Powershell script to extract text between a keyword

i am looking to extract data from a txt file and output it to other txt files. here is the content of the txt file

HAC 06: CATHETHER-ASSOCIATED URINARY TRACT INFECTION (UTI)
SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

OR SECONDARY DIAGNOSIS

  B3741     Candidal cystitis and urethritis
  B3749     Other urogenital candidiasis
  N10   CC  Acute tubulo-interstitial nephritis
  N340  CC  Urethral abscess
  N390  CC  Urinary tract infection, site not specified

WITH SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

HAC 07: VASCULAR CATHETHER-ASSOCIATED INFECTION
SECONDARY DIAGNOSIS

  T80211A CC  Bloodstream infection due to central venous catheter, initial encounter
  T80212A CC  Local infection due to central venous catheter, initial encounter
  T80218A CC  Other infection due to central venous catheter, initial encounter
  T80219A CC  Unspecified infection due to central venous catheter, initial encounter

HAC 08: SURGICAL SITE INFECTION-MEDIASTINITIS AFTER CORONARY BYPASS GRAFT (CABG)
PROCEDURES

  0210093 Bypass Coronary Artery, One Site from Coronary Artery with Autologous Venous Tissue, Open Approach
  0210098 Bypass Coronary Artery, One Site from Right Internal Mammary with Autologous Venous Tissue, Open Approach

I want to extract it into three files for contents under HAC 06, HAC 07 and HAC 08

HAC 06 will have

HAC 06: CATHETHER-ASSOCIATED URINARY TRACT INFECTION (UTI)
SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

OR SECONDARY DIAGNOSIS

  B3741     Candidal cystitis and urethritis
  B3749     Other urogenital candidiasis
  N10   CC  Acute tubulo-interstitial nephritis
  N340  CC  Urethral abscess
  N390  CC  Urinary tract infection, site not specified

WITH SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

HAC 07 will have and so on

HAC 07: VASCULAR CATHETHER-ASSOCIATED INFECTION
SECONDARY DIAGNOSIS

  T80211A CC  Bloodstream infection due to central venous catheter, initial encounter
  T80212A CC  Local infection due to central venous catheter, initial encounter
  T80218A CC  Other infection due to central venous catheter, initial encounter
  T80219A CC  Unspecified infection due to central venous catheter, initial encounter

I started with some code

$filename = "HAC.txt"
$output_file = "extract_$HAC06"

$extract = @()
select-string -path $filename -pattern "HAC" -context 0,1 |
    foreach-object {
    $extract += $_.line
    $extract += $_.context.postcontext
    }

$extract | out-file $output_file

but i am stuck....any help

Upvotes: 0

Views: 271

Answers (1)

TheMadTechnician
TheMadTechnician

Reputation: 36332

You can import all of the text as one multi-line string, split it on the HAC lines, and then export each based on the HAC number listed in the first line. Something like this:

$AllText = (Get-Content "HAC.txt") -join "`r`n"
$AllText -Split "(?=HAC \d)"| Where{$_ -match "^(HAC \d+)"} | ForEach{Set-Content -Value $_ -Path ($Matches[1]+'.txt')}

That will output 3 files named after the HAC codes with exactly what you were looking for as content.

Edit: Ok, if you want to modify where the files are output we can add a path like this:

$OutFolder = 'C:\Path\For\Output\'
$AllText = (Get-Content "HAC.txt") -join "`r`n"
$AllText -Split "(?=HAC \d)"| Where{$_ -match "^(HAC \d+)"} | ForEach{Set-Content -Value $_ -Path ($OutFolder + $Matches[1] + '.txt')}

Upvotes: 1

Related Questions