Merrill Cook
Merrill Cook

Reputation: 1208

Powershell .txt to CSV Formatting Irregularities

I have a large number of .txt files pulled from pdf and formatted with comma delimiters.

I'm trying to append these text files to one another with a new line between each. Earlier in the formatting process I took multi-line input and formatted it into one line with entries separated by commas.

Yet when appending one txt file to another in a csv the previous formatting with many line breaks returns. So my final output is valid csv, but not representative of each text file being one line of csv entries. How can I ensure the transition from txt to csv retains the formatting of the txt files?

I've used Export-CSV, Add-Content, and the >> operator with similar outcomes.

To summarize, individual .txt files with the following format:

,927,Dance like Misty"," shine like Lupita"," slay like Serena. speak like Viola"," fight like Rosa! ,United States ,16 - 65+

Turn into the following when appended together in a csv file:

,927
,Dance like Misty"," shine like Lupita"," slay like Serena. speak like Viola"," fight like Rosa!
,United States
,16 - 65+

How the data was prepped:

Removing new lines

Foreach($f in $FILES){(Get-Content $f -Raw).Replace("`n","") | Set-Content $f -Force}

Adding one new line to the end of each txt file

 foreach($f in $FILES){Add-Content -Path $f -value "`n" |Set-Content $f -Force}

Trying to Convert to CSV, one text file per line with comma delimiter:

cat $FILES | sc csv.csv

Or

 foreach($f in $FILES){import-csv $f -delimiter "," | export-csv $f}

Or

foreach($f in $FILES){ Export-Csv -InputObject $f -append -path "test.csv"}

Return csv with each comma separated value on a new line, instead of each txt file as one line.

Upvotes: 1

Views: 388

Answers (1)

Merrill Cook
Merrill Cook

Reputation: 1208

This was resolved by realizing that even though notepad was showing no newlines, there were still hidden return carriage characters. On loading the apparently one line csv files into Notepad++ and toggling "show hidden characters" this oversight was evident.

By replacing both \r and \n characters before converting to CSV,

Foreach($f in $FILES){(Get-Content $f -Raw).Replace("\n","").Replace("\r","" | 
Set-Content $f -Force}

The CSV conversion process worked as planned using the following

cat $FILES | sc final.csv

Final verdict -- The text file that appeared to be a one line entry ready to become CSV

,927,Dance like Misty"," shine like Lupita"," slay like Serena. speak like Viola"," fight like Rosa! ,United States ,16 - 65+

Still had return carriage characters between each value. This was made evident by trying another text editor with the feature "show hidden characters."

Upvotes: 1

Related Questions