Reputation: 111
I'm trying to write a subroutine that will take two arguments, a filename
and the column name
inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.
I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.
sub remove_columns {
my @Para = @_;
my $args = @Para;
die "Insufficent arguments\n" if ($nargs < 2);
open file, $file
$header = <file>;
chomp $header;
my @hdr = split ',',$header;
while (my $line = <file>){
chomp $line;
my @vals = split ',',$line;
#hash that will allow me to access column name and values quickly
my %h;
for (my $i=0; $i<=$#hdr;$i++){
$h{$hdr[$i]}=$i;
}
....
}
Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.
Upvotes: 1
Views: 2182
Reputation: 26121
There is elegant way how to remove some columns from array. If I have columns to removal in array @cols
, and headers in @headers
I can make array of indexes to preserve:
my %to_delete;
@to_delete{@cols} = ();
my @idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;
Then it's easy to make new headers
@headers[@idxs]
and also new row from read columns
@columns[@idxs]
The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.
Upvotes: 1
Reputation: 41
You should probably look in the direction of Text::CSV
Or you can do something like this:
my $colnum;
my @columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(@columns); $i++) {
if($columns[$i] =~ /^$unwanted_column_name$/) {
$colnum = $i;
last;
};
};
while(<$file>) {
my @row = split(/,/, $_);
splice(@row, $colnum, 1);
#do something with resulting array @row
};
Side note:
you really should use strict
and warnings
;
split(/,/, <$file>);
won't work with all CSV files
Upvotes: 1
Reputation: 3631
Here are a few hints that will hopefully get you going.
To remove the element of an array at position $index
of an array use :
splice @array,$index,1 ;
As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array
for my $index (@indices) {
splice @array,$index,1 ;
}
(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++)
type loop )
Another thing to consider - CSV format is surprisingly complicated. Might your data have data with ,
within " "
such as
1,"column with a , in it"
I would consider using something like Text::CSV
Upvotes: 1