Reputation: 111

Search for, and remove column from CSV file

I'm trying to write a subroutine that will take two arguments, a filename and the column name inside a CSV file. The subroutine will search for the second argument (column name) and remove that column (or columns) from the CSV file and then return the CSV file with the arguments removed.

I feel like I've gotten through the first half of this sub (opening the file, retrieve the headers and values) but I can't seem to find a way to search the CSV file for the string that the user inputs and delete that whole column. Any ideas? Here's what I have so far.

sub remove_columns {
   my @Para = @_;
   my $args = @Para;
   die "Insufficent arguments\n" if ($nargs < 2);

   open file, $file
   $header = <file>;
   chomp $header;

   my @hdr = split ',',$header;

   while (my $line = <file>){
    chomp $line;
    my @vals = split ',',$line;

    #hash that will allow me to access column name and values quickly
    my %h;

    for (my $i=0; $i<=$#hdr;$i++){
      $h{$hdr[$i]}=$i;
    }
     ....
}

Here's where the search and removal will be done. I've been thinking about how to go about this; the CSV files that I'll be modifying will be huge, so speed is a factor, but I can't seem to think of a good way to go about this. I'm new to Perl, so I'm struggling a bit.

Upvotes: 1

Answers (3)

Hynek -Pichi- Vychodil

Reputation: 26121

There is elegant way how to remove some columns from array. If I have columns to removal in array @cols, and headers in @headers I can make array of indexes to preserve:

my %to_delete;
@to_delete{@cols} = ();
my @idxs = grep !exists $to_delete{$headers[$_]}, 0 .. $#headers;

Then it's easy to make new headers

@headers[@idxs]

and also new row from read columns

@columns[@idxs]

The same approach can be used for example for rearranging arrays. It is very fast and pretty idiomatic Perl way how to do this sort of tasks.

Upvotes: 1

darken

Reputation: 41

You should probably look in the direction of Text::CSV

Or you can do something like this:

my $colnum;
my @columns = split(/,/, <$file>);
for(my $i = 0; $i < scalar(@columns); $i++) {
    if($columns[$i] =~ /^$unwanted_column_name$/) {
         $colnum = $i;
         last;
    };
};

while(<$file>) {
   my @row = split(/,/, $_);
   splice(@row, $colnum, 1);
   #do something with resulting array @row
};

Side note: you really should use strict and warnings;

split(/,/, <$file>);

won't work with all CSV files

Upvotes: 1

justintime

Reputation: 3631

Here are a few hints that will hopefully get you going.

To remove the element of an array at position $index of an array use :

splice @array,$index,1 ;

As speed is an issues, you probably want to construct an array of column numbers at the start and then loop on the the elements of the array

for my $index (@indices) {
  splice @array,$index,1 ;
}

(this way is more idiomatic Perl than for (my $i=0; $i<=$#hdr;$i++) type loop )

Another thing to consider - CSV format is surprisingly complicated. Might your data have data with , within " " such as

1,"column with a , in it"

I would consider using something like Text::CSV

Upvotes: 1

Search for, and remove column from CSV file

Answers (3)

Related Questions