Reputation: 79
I am fairly new to scripting and have searched around on the internet but cannot seem to find a specific solution for what I'm trying to achieve so I'm hoping someone could please shed some light.
I have a .txt file that contains various lines of data which is organised by text starting at specific column numbers - basically a table of data. See example below which shows where each column starts:
| | |
|1214000 |1234567890 |ISRBWPX0001000001
| | |
|MD-3300 |+12345678912 |MDABWPX0001000001
| | |
| | |
| | |
Col:620 Col:632 Col:672
Please click here for screenshot if above example makes no sense
I want the script to find all lines which contains 'MD-' in column 620 and delete this so only the number is left. So I ran the Replace command in PowerShell which did delete all lines containing 'MD-' however it misaligned the rest of the columns;
PowerShell Command used:
(Get-Content "test.txt") |
Foreach-Object {$_.replace("MD-", "")} |
Set-Content "testedited.txt"
Output of above command:
| | |
|1214000 |1234567890 |ISRBWPX0001000001
| | |
|3300 |+12345678912 |MDABWPX0001000001
| | |
| | |
| | |
Col:620 Col:632 Col:672
Click here for screenshot if above example makes no sense
As you can see '+12345678912' is no longer aligned with Column 632 as well as 'MDABWPX0001000001' which is no longer aligned with Column 672.
Is there a way to carry out the above command without affecting other columns? I read somewhere that Hash tables can do this however I did not completely understand the method.
Desired output:
| | |
|1214000 |1234567890 |ISRBWPX0001000001
| | |
|3300 |+12345678912 |MDABWPX0001000001
| | |
| | |
| | |
Col:620 Col:632 Col:672
Please click here to see screenshot of desired output
I am open to use any scripting languages / methods to carry out this task so any suggestions would be much appreciated.
Thank you so much in advance.
Upvotes: 2
Views: 302
Reputation: 16236
It is easy enough to do this with a -replace regex. I tested with 'MD' starting in the 11th column. Change it to 620, or whatever it needs to be.
(Get-Content "test.txt") |
ForEach-Object { $_ -replace '^(.{11})MD\-([^ ]*|)(.*)$', '$1$2 $3' } |
Set-Content "testedited.txt"
Here is the test data and sample run.
PS C:\src\t\repmd> Get-Content .\test.txt
0123456789|asdf |asdfdsaf
0123456789|MD-333 |asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqw |asdfdsaf
PS C:\src\t\repmd> .\repmd.ps1
PS C:\src\t\repmd> Get-Content .\testedited.txt
0123456789|asdf |asdfdsaf
0123456789|333 |asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqw |asdfdsaf
The regex breaks down like this.
^ beginning of string
(.{11}) capture 1 - eleven (11) characters
MD\- literal 'MD-' (the '-' character needs to be escaped with \
([^ ]*|) capture 2 - all non-space characters until a VERTICAL LINE
(.*) capture 3 - all remaining characters
$ end of string
'$1$2 $3'
produces the captured strings. The three (3) spaces before $3 replace the three (3) characters take out by 'MD-'.
Upvotes: 1
Reputation: 67216
@echo off
setlocal EnableDelayedExpansion
rem Change next line by 620 and 12
set /A "pos=11, wide=8"
set /A "posP3=pos+3, rest=wide-3, posPwide=pos+wide"
(for /F "delims=" %%a in (Input.txt) do (
set "line=%%a"
if "!line:~%pos%,3!" equ "MD-" (
set "line=!line:~0,%pos%!!line:~%posP3%,%rest%! !line:~%posPwide%!"
)
echo !line!
)) > Output.txt
Input.txt:
0123456789|asdf |asdfdsaf
0123456789|MD-333 |asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqw |asdfdsaf
Output.txt:
0123456789|asdf |asdfdsaf
0123456789|333 |asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqwasda|asdfdsaf
0123456789|cwqw |asdfdsaf
0123456789|cwqw |asdfdsaf
Upvotes: 0