Andrew Newby
Andrew Newby

Reputation: 5197

Newlines inside CSV data break Text::CSV reading

I have some sample data (from a Google Sheets spreadsheet);

Title,Website URL,Description,Contact Number,Address 1,Address 2,City,State ,Province,Country,Facebook,Twitter,Full Category Name
Sample title,http://www.test.com,"A short description. Press Ctrl+Enter to create a new line

Like this",44 (0)1430 123 123,Some address line,Line 2,The city,The state,Some province,UK,facebook.com/test,twitter.com/test,Full Category/Path/To/Category or ID

I then have this simple script:

use Text::CSV;
my $csv = Text::CSV->new({
    sep_char => ','
   });
   $csv->column_names("title","url","description","contact_number","address1","address2","city","state","province","country","facebook","twitter","category_name");

my $file = './tmp/$USER->{Username}.csv';

open (WRITEIT, ">:encoding(utf8)", $file) or die "cant write $file: $!";
    print WRITEIT join("\n", @data). "\n";
close (WRITEIT) or die "cant write $file: $!";

open my $io, "<:encoding(utf8)", $file or die "$file: $!";

use Data::Dumper;

my $i = 0;
while (my $row = $csv->getline_hr($io)) {

    $i++;

    print Dumper($row);

}

The problem with this, is that I get the following output:

$VAR1 = 'Sample title,http://www.test.com,"A short description. Press Ctrl+Enter to create a new line
';
$VAR2 = '
';
$VAR3 = 'Like this",44 (0)1430 123 123,Some address line,Line 2,The city,The state,Some province,UK,facebook.com/test,twitter.com/test,Full Category/Path/To/Category or ID';

It's taking the \n in the description as a newline. Is there any way around this?

Upvotes: 2

Views: 91

Answers (1)

mob
mob

Reputation: 118605

https://metacpan.org/pod/Text::CSV#Embedded-newlines:

Important Note: The default behavior is to accept only ASCII characters in the range from 0x20 (space) to 0x7E (tilde). This means that the fields can not contain newlines. If your data contains newlines embedded in fields, or characters above 0x7E (tilde), or binary data, you must set binary => 1 in the call to new. To cover the widest range of parsing options, you will always want to set binary.

Upvotes: 9

Related Questions