Jasdeep
Jasdeep

Reputation: 79

Script to batch find and replace specific text then add 3 spaces after the replaced text in .txt file

I am fairly new to scripting and have searched around on the internet but cannot seem to find a specific solution for what I'm trying to achieve so I'm hoping someone could please shed some light.

I have a .txt file that contains various lines of data which is organised by text starting at specific column numbers - basically a table of data. See example below which shows where each column starts:

 |            |                                      |
 |1214000     |1234567890                            |ISRBWPX0001000001
 |            |                                      |
 |MD-3300     |+12345678912                          |MDABWPX0001000001
 |            |                                      |
 |            |                                      |
 |            |                                      |
 Col:620      Col:632                                Col:672

Please click here for screenshot if above example makes no sense

I want the script to find all lines which contains 'MD-' in column 620 and delete this so only the number is left. So I ran the Replace command in PowerShell which did delete all lines containing 'MD-' however it misaligned the rest of the columns;

PowerShell Command used:

(Get-Content "test.txt") | 
Foreach-Object {$_.replace("MD-", "")} | 
Set-Content "testedited.txt"

Output of above command:

 |            |                                      |
 |1214000     |1234567890                            |ISRBWPX0001000001
 |            |                                      |
 |3300     |+12345678912                          |MDABWPX0001000001
 |            |                                      |
 |            |                                      |
 |            |                                      |
 Col:620      Col:632                                Col:672

Click here for screenshot if above example makes no sense

As you can see '+12345678912' is no longer aligned with Column 632 as well as 'MDABWPX0001000001' which is no longer aligned with Column 672.

Is there a way to carry out the above command without affecting other columns? I read somewhere that Hash tables can do this however I did not completely understand the method.

Desired output:

 |            |                                      |
 |1214000     |1234567890                            |ISRBWPX0001000001
 |            |                                      |
 |3300        |+12345678912                          |MDABWPX0001000001
 |            |                                      |
 |            |                                      |
 |            |                                      |
 Col:620      Col:632                                Col:672

Please click here to see screenshot of desired output

I am open to use any scripting languages / methods to carry out this task so any suggestions would be much appreciated.

Thank you so much in advance.

Upvotes: 2

Views: 302

Answers (3)

lit
lit

Reputation: 16236

It is easy enough to do this with a -replace regex. I tested with 'MD' starting in the 11th column. Change it to 620, or whatever it needs to be.

(Get-Content "test.txt") |
    ForEach-Object { $_ -replace '^(.{11})MD\-([^ ]*|)(.*)$', '$1$2   $3' } |
    Set-Content "testedited.txt"

Here is the test data and sample run.

PS C:\src\t\repmd> Get-Content .\test.txt
0123456789|asdf    |asdfdsaf
0123456789|MD-333  |asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqw    |asdfdsaf
PS C:\src\t\repmd> .\repmd.ps1
PS C:\src\t\repmd> Get-Content .\testedited.txt
0123456789|asdf    |asdfdsaf
0123456789|333     |asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqw    |asdfdsaf

The regex breaks down like this.

^        beginning of string
(.{11})  capture 1 - eleven (11) characters
MD\-     literal 'MD-' (the '-' character needs to be escaped with \
([^ ]*|) capture 2 - all non-space characters until a VERTICAL LINE
(.*)     capture 3 - all remaining characters
$        end of string

'$1$2 $3' produces the captured strings. The three (3) spaces before $3 replace the three (3) characters take out by 'MD-'.

Upvotes: 1

Aacini
Aacini

Reputation: 67216

@echo off
setlocal EnableDelayedExpansion

rem Change next line by 620 and 12
set /A "pos=11, wide=8"
set /A "posP3=pos+3, rest=wide-3, posPwide=pos+wide"

(for /F "delims=" %%a in (Input.txt) do (
   set "line=%%a"
   if "!line:~%pos%,3!" equ "MD-" (
      set "line=!line:~0,%pos%!!line:~%posP3%,%rest%!   !line:~%posPwide%!"
   )
   echo !line!
)) > Output.txt

Input.txt:

0123456789|asdf    |asdfdsaf
0123456789|MD-333  |asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqw    |asdfdsaf

Output.txt:

0123456789|asdf    |asdfdsaf
0123456789|333     |asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw    |asdfdsaf
0123456789|cwqw    |asdfdsaf

Upvotes: 0

Nas
Nas

Reputation: 1263

$_ -replace '(?<=.{620})MD-([0-9]+)',('$1'+' '*3)

Upvotes: 1

Related Questions