Sune
Sune

Reputation: 3270

Format of CSV not correct?

I am generating a CSV with EXPORT-CSV in Powershell and then feeding it to a Perl script. But Perl is unable to import the file.

I have verified the CSV-file against a working version (that has been exported from the same Perl-script and not powershell) and there are NO difference. The coloumns are excactly the same and they both have semicolon as delimiter. If I open the file in Excel however everything ends up in the first cell on each line (meaning I have to do a text-to-coloumns). The working file ends up in a different cells from the start..

To add to the confusion: when I open the file in notepad and copy/paste the contents to a new file the import works!

So, what am I missing? Are there "hidden" properties that I cannot spot with Notepad? Do I have to change the encoding-type?

Please help:)

Upvotes: 4

Views: 1693

Answers (5)

Daniel Richnak
Daniel Richnak

Reputation: 1604

Given what has been discovered through the other posts, I think your best bet is to:

  1. Convert to a CSV string (which uses unix-y carriage returns rather than Windows)
  2. Send that to a file, ensuring the encoding is not ASCII.

$str = $object | convertto-csv -notypeinformation | foreach-object { $_ -replace "`"","" } #

foreach-object is a hack to remove the extra quotes that convertto-csv adds. If your data may have double-quotes, you'll need to look at alternatives.

$str | out-file -filepath "path\to\newcsv" -encoding UTF8

Upvotes: 0

dgw
dgw

Reputation: 13646

From CPAN Text::CSV:

use Text::CSV;

my @rows;
my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
             or die "Cannot use CSV: ".Text::CSV->error_diag();

open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
while ( my $row = $csv->getline( $fh ) ) {
  $row->[2] =~ m/pattern/ or next; # 3rd field should match
  push @rows, $row;
}
$csv->eof or $csv->error_diag();
close $fh;

Never try to parse CSV yourself, it seems easy at first glance but has a lot of deep pits to fall into.

Upvotes: 1

Andy Arismendi
Andy Arismendi

Reputation: 52577

To get a better look at your CSV files try using Notepad++. This will tell you the file encoding in the status bar. Also turn on hidden characters (View > Show Symbol > Show All Characters). This will reveal if there are just line feeds, or carriage returns + line feeds, tabs vs spaces etc... You can also change the file encoding from the Encoding menu. This may help you identify the differences. Notepad doesn't display any of this information.

Update - Here's how to convert a text file from Windows to Unix format in code:

$allText = [IO.File]::ReadAllText("C:\test.csv") -replace "`r`n?", "`n" 
$encoding = New-Object System.Text.ASCIIEncoding    
[IO.File]::WriteAllText("C:\test2.csv", $allText, $encoding)

Or you can use Notepad++ (Edit > EOL Conversion > Unix Format).

Upvotes: 6

manojlds
manojlds

Reputation: 301087

It could be a encoding issue when you are using export-csv

The default is ASCII, which should be fine usually, but try setting -Encoding UTF8 in the Export-CSV command.

Upvotes: 2

Michael McGowan
Michael McGowan

Reputation: 6608

Excel tends to assume that files saved in the .csv format are indeed comma-delimited. However, it seems you are using semicolons. You can try switching to commas, or if that is not an option, try changing the extension to .txt. Excel should automatically recognize it if you do the former, whereas the latter will take you through the import wizard upon loading the file.

Upvotes: 0

Related Questions