Reputation: 75
In my csv file there are thousands of records.And those records contain data with comma.How i should escape .I seen few links which mentioned about using module TEXT::CSV.But still issue persist ie recofds gets break where ever comma comes. Here is sample code i have been working:
use Data::Dumper qw(Dumper);
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
use Text::CSV;
my $csv = Text::CSV->new({ sep_char => ',' });
my $file = 'Pkg_Notes_small.csv';
my %data;
my %dupes;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>)
{
my $catalog_num ='';
my $catalog_sid ='';
my $note_text ='';
my $packing_notes_id ='';
#($catalog_num,$catalog_sid,$packing_notes_id,$note_text) = split ',', $line;
if ($csv->parse($line))
{
my @fields = $csv->fields();
$catalog_num = $fields[0];
$catalog_sid = $fields[1];
$packing_notes_id = $fields[2];
$note_text = $fields[3];
}
}
i will add sample data from csv file :
CATALOG_NUM CATALOG_SID PACKAGING_NOTES_SID PACKAGING_NOTES_DESC
112194 , 521 , 77 , For WI Packaging Operations: For
finishing operations, the use of a protective
comma used above for clarity. Now converting this data to query based on my requirement for which i am getting:
for which Notetext field i am getting is :
For WI Packaging Operations: For finishing operations
remaining part for this line after processing is missed out.the data iam expecting is :
For WI Packaging Operations: For
finishing operations, the use of a protective
Upvotes: 2
Views: 1711
Reputation: 69244
It is possible to have commas in records in CSV files if the fields that contain them are quoted.
field1,field2,"field3, with embedded comma",field4
But Text::CSV knows all about that, so if you're using Text::CSV and still getting truncated records, then it seems likely that the problem is with your input data. If the embedded commas aren't quoted then obviously there is no way for the parser to differentiate between separator commas and commas that are part of the text.
field1,field2,field3, with embedded comma,field4
In the example above, how can the computer possibly know that the third comma should be ignored?
If this is your problem and you can't get the input data fixed, then there is one possible workaround. If the extra commas only ever exist in the last field of a record, then you can use the little-known third parameter to split()
to limit the number of fields created from a record.
So if your record looks like this:
field1,field2,field3,field4 with, embedded, commas
You can use:
my @data = split /,/ $input, 4;
to force split()
to only split the data into four fields. Anything after the third comma is all put into the fourth field.
Usually it's a very bad idea to parse CSV data using split()
(because it doesn't know to ignore quoted commas), but in controlled circumstances like this, it can work well.
Update: Now I have seen some (well, one line!) of your input data, I see that my prediction was right. You have unquoted commas in your data. So using Text::CSV is not going to work.
But my split()
solution works fine. Here's an example:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
while (<DATA>) {
my @fields = split /,/, $_, 4;
say $fields[3];
}
__END__
112194 , 521 , 77 , For WI Packaging Operations: For finishing operations, the use of a protective
And the output I get is:
For WI Packaging Operations: For finishing operations, the use of a protective
Upvotes: 6