Reputation: 93
I have a source file which is in .txt format. It looks like a semi-colon separated file:
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
However, it is determined by length. So it is a length-delimited file.
Fist column for example is from first value to the third (100), then a semi-colon follows. Second column starts at 5th position (including), until (including) 7th position. A string column can contain a semi-colon.
Now I want to import this length-delimited txt file with Powershell and export it as a csv file. This file should be really semi-colon separated. The result should look like
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
But I have simply no idea how to do it? I googled it, but I did not find that much useful code examples for importing length-delimited txt files with PowerShell.
Unfortunately, I cannot use Python. I am not sure, if this task is generally possible using Powershell? Because when exporting, Powershell also needs to recognize that there are string values containing the separator, so it has to pay attention to the quoting: "Thisisa;;ringcolumnB". I think it would be also ok for me, if the whole column is quoted, so every entry in a string column gets quotes added.
Upvotes: 0
Views: 53
Reputation: 174485
You can use regex to describe a string in which the 3rd "column" contains a ;
and then inject the quotation marks with the -replace
operator:
$lines = Get-Content path\to\file.txt
@($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
The expression (.{20}(?<=;.{0,19}))
is going to match the 20-char 3rd column value only if it contains at least one semi-colon - so lines with no semicolon in that column will be left alone:
# let's try it out with your test data
$lines = @'
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
'@ -split '\r?\n'
@($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
Which yields the following four strings:
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
To write the output back to file, use Set-Content
:
@($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;' |Set-Content path\to\fixed_output.scsv
Upvotes: 1