Reputation: 512
For work purposes, I have several CSV files which have been modified by a vendor upgrade, and they now contain approximately 80 more columns than they used to. The downside is, these files are used for billing, so we need to trim off the new columns. The upside is all of the columns have been added to the end of the record. The older records contained 251 columns. The new records contain 336.
So, the script I am writing will accept the CSV filename as an argument, edit it in place because the files can be very large, remove the first two lines and the last line, and finally remove the new columns (not just empty their contents, completely remove them, so if the original format had N columns, the new format after being processed should only have N columns)
Here is what I have so far:
use strict;
use warnings;
#Use Tie::File to modify file contents directly on disk, without reading
#to memory.
use Tie::File;
#Use Text::CSV_XS to quickly remove columns from CSV. External library
#used to compensate for quoted fields.
use Text::CSV_XS;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
my $file;
foreach $file (@ARGV){
my @CSVFILE;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
tie @CSVFILE, 'Tie::File', $file or die $!;
shift @CSVFILE;
shift @CSVFILE;
pop @CSVFILE;
for my $line (@CSVFILE) {
$csvparser->parse($line);
my @fields = $csvparser->fields;
splice @fields, -85;
$line = $csvparser->combine(@fields);
}
untie @CSVFILE;
}
This will run, and the first portion runs correctly (removing the first 2 and last lines). However I am unsure how to proceed with removing the new columns. I have been reading through the docs for Text::CSV_XS and I can't seem to find any functions which would remove a column. Some of the examples may be helpful, but I confess my perl skills aren't very good. My main reason for wanting to use the module is that these CSV files do occasionally contain fields with commas, enclosed in quotes, which the module can handle.
Any suggestions about how to approach this would be wonderful. Also if there is an issue with my approach, please let me know. I'm by no means a perl expert and am open to any helpful criticism since this will be getting fed into a billing system.
EDIT: Included suggestions from below into code. As mentioned below, when running this the source file's contents get replaced with a single "1" on every line.
Upvotes: 3
Views: 663
Reputation: 126722
Yes, you can do what you ask, although I wouldn't hope for any great speed.
Something like this should work
use strict;
use warnings;
use Tie::File;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new or die Text::CSV_XS->error_diag;
foreach my $file (@ARGV) {
tie my @lines, 'Tie::File', $file or die $!;
splice @lines, 0, 2;
pop @lines;
for my $line (@lines) {
$csv->parse($line);
my @fields = $csv->fields;
splice @fields, -80;
$csv->combine(@fields);
$line = $csv->string;
}
untie @lines;
}
Upvotes: 4