fang yu
fang yu

Reputation: 481

Opening a CSV file created in Mac Excel with Perl

I'm having a bit of trouble with the Perl code below. I can open and read in a CSV file that I've made manually, but if I try to open any Mac Excel spreadsheet that I save as a CSV file, the code below reads it all as a single line.

#!/usr/bin/perl

use strict;
use warnings;

open F, "file.csv";

foreach (<F>)
{
    ($first, $second, undef, undef) = split (',', $_);
}

print "$first : $second\n";

close(F);

Upvotes: 2

Views: 2420

Answers (6)

DMart
DMart

Reputation: 2485

if you set the "special variable" that handles what it considers a newline to \r you'll be able to read one line at a time: $/="\r"; in this particular case the mac new line for perl is default \n but the file is probably using \r. This builds off what Flynn1179 & Mark Thalman said but shows you what to do to use the while () style reading.

Upvotes: 0

Joel Berger
Joel Berger

Reputation: 20280

As others have suspected, your line endings are probably to blame. On my Linux-based system there are builtin utilities to change these line endings. mac2unix (which I think is just a wrapper around dos2unix will read your file and change the line endings for you. You should have something similar both on Linux and Mac (Microsoft may not care about you).

If you want to handle this in Perl, look into setting the $/ variable to set the "input record separator" from "\n" to "\r" (if thats the right ending). Try local $/ = "\r" before you read the file. Read more about it in perldoc perlvar (near $/) or in perldoc perlport (devoted to writing portable Perl code.

P.S. if I have some part of this incorrect let me know, I don't use Mac, I just think I know the theory

Upvotes: 2

pndc
pndc

Reputation: 3805

I've directly experienced this problem with Excel 2004 for Mac. The line endings are indeed \r, and IIRC, the text uses the MacRoman character set, rather than Latin-1 or UTF-8 as you might expect.

So as well as the good advice to use Text::CSV / Text::CSV_XS and splitting on \r, you will want to open the file using the MacRoman encoding like so:

open my $fh, "<:encoding(MacRoman)", $filename
   or die "Can't read $filename: $!";

Likewise, when reading a file exported with Excel on Windows, you may wish to use :encoding(cp1252) instead of :encoding(MacRoman) in that code.

Upvotes: 3

Mark
Mark

Reputation: 6128

Traditional Macintosh (System 9 and previous) uses CR (0x0D, \r) as the line separator. Mac OS X (Unix based) uses LF(0x0A, \n) as the default line separator, so the perl script, being a Unix tool, is probably expecting LF but is getting CR. Since there are no line separators in the file perl thinks there is only one line. If it had Windows line endings (CR,LF) you'd probably be getting an invisible CR at the end of each line.

A quick loop over the input replacing 0x0D with 0x0A should fix your problem.

Upvotes: 5

Flynn1179
Flynn1179

Reputation: 12075

Not sure about Mac excel, but certainly the windows version tends to enclose all values in quotes: "like","this". Also, you need to take into account the possibility of there being a quote in the value, which would show up "like""this" (there's only a single " in that value).

To actually answer your question however, it's likely that it's using a different newline character from what you'd expect. It's probably saving as \r\n instead of \n, or vice versa.

Upvotes: 2

Alan Haggai Alavi
Alan Haggai Alavi

Reputation: 74272

Always use a specialised module (such as Text::CSV or Text::CSV_XS) for this purpose as there are lots of cases where split-ing will not help (for example when the fields contain a comma which is not a field separator but is within quotes).

Upvotes: 10

Related Questions