Neel
Neel

Reputation: 97

How can I filter out specific column from a CSV file in Perl?

I am just a beginner in Perl and need some help in filtering columns using a Perl script. I have about 10 columns separated by comma in a file and I need to keep 5 columns in that file and get rid of every other columns from that file. How do we achieve this?

Thanks a lot for anybody's assistance.

cheers, Neel

Upvotes: 0

Views: 11731

Answers (10)

haytona
haytona

Reputation: 21

You can use some of Perl's built in runtime options to do this on the command line:

$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}' 1,4

The above will -a(utosplit) using the -F(ield) of a comma. It will then join the fields you are interested in and print them back out (with a line separator). This assumes simple data without nested comma's. I was doing this with an unprintable field separator (\x1d) so this wasn't an issue for me.

See http://perldoc.perl.org/perlrun.html#Command-Switches for more details.

Upvotes: 2

JVeldhuis
JVeldhuis

Reputation:

Went looking didn't find a nice csv compliant filter program thats flexible to be useful for than just a one-of, so I wrote one. Enjoy.

Basic usage is:

bash$ csvfilter [-r <columnTitle>]* [-quote] <csv.file>

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

use Text::CSV;

my $always_quote=0;

my @remove;
if ( ! GetOptions('remove:s'=> \@remove,
          'quote-always'=>sub {$always_quote=1;}) ) {
   die "$0:invalid option (use --remove  [--quote-always])";
}

my @cols2remove;

sub filter(@)
{
   my @fields=@_;
   my @r;
   my $i=0;
   for my $c (@cols2remove) {
       my $p;
       #if ( $i  $i ) {
       push(@r, splice(@fields, $i));
   }
   return @r;
}

# create just one if these
my $csvOut=new Text::CSV({always_quote=>$always_quote});

sub printLine(@)
{
    my @fields=@_;
    my $combined=$csvOut->combine(filter(@fields));
    my $str=$csvOut->string();
    if ( length($str) ) {
     print "$str\n";
    }
}

my $csv = Text::CSV->new();

my $od;
open($od, "| cat") || die "output:$!";
while () {
    $csv->parse($_);
    if ( $. == 1 ) {
    my $failures=0;
    my @cols=$csv->fields;
    for my $rm (@remove) {
        for (my $c=0; $c$b} @cols2remove);
    }
    printLine($csv->fields);
}

exit(0);
\

Upvotes: 1

Jack M.
Jack M.

Reputation: 32070

My personal favorite way to do CSV is using the AnyData module. It seems to make things pretty simple, and removing a named column can be done rather easily. Take a look on CPAN.

Upvotes: 0

Shlomi Fish
Shlomi Fish

Reputation: 4510

In addition to what people here said about processing comma-separated files, I'd like to note that one can extract the even (or odd) array elements using an array slice and/or map:

@myarray[map { $_ * 2 } (0 .. 4)]

Hope it helps.

Upvotes: 0

Paul Tomblin
Paul Tomblin

Reputation: 182782

Have a look at Text::CSV (or Text::CSV_XS) to parse CSV files in Perl. It's available on CPAN or you can probably get it through your package manager if you're using Linux or another Unix-like OS. In Ubuntu the package is called libtext-csv-perl.

It can handle cases like fields that are quoted because they contain a comma, something that a simple split command can't handle.

Upvotes: 20

Sparr
Sparr

Reputation: 7712

This answers a much larger question, but seems like a good relevant bit of information.

The unix cut command can do what you want (and a whole lot more). It has been reimplemented in Perl.

Upvotes: -3

oylenshpeegul
oylenshpeegul

Reputation: 3424

Alternatively, you could use Text::ParseWords, which is in the standard library. Add

use Text::ParseWords;

to the top of Pax's example above, and then substitute

  my @fields = parse_line(q{,}, 0, $_);

for the split.

Upvotes: 2

PolyThinker
PolyThinker

Reputation: 5218

If you are talking about CSV files in windows (e.g., generated from Excel), you will need to be careful to take care of fields that contain comma themselves but are enclosed by quotation marks.

In this case, a simple split won't work.

Upvotes: 4

Josh Lee
Josh Lee

Reputation: 177594

CSV is an ill-defined, complex format (weird issues with quoting, commas, and spaces). Look for a library that can handle the nuances for you and also give you conveniences like indexing by column names.

Of course, if you're just looking to split a text file by commas, look no further than @Pax's solution.

Upvotes: 6

paxdiablo
paxdiablo

Reputation: 881523

Use split to pull the line apart then output the ones you want (say every second column), create the following xx.pl file:

while(<STDIN>) {
    chomp;
    @fields = split (",",$_);
    print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}

then execute:

$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10

Upvotes: 5

Related Questions