Reputation: 97
I am just a beginner in Perl and need some help in filtering columns using a Perl script. I have about 10 columns separated by comma in a file and I need to keep 5 columns in that file and get rid of every other columns from that file. How do we achieve this?
Thanks a lot for anybody's assistance.
cheers, Neel
Upvotes: 0
Views: 11731
Reputation: 21
You can use some of Perl's built in runtime options to do this on the command line:
$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}'
1,4
The above will -a(utosplit) using the -F(ield) of a comma. It will then join the fields you are interested in and print them back out (with a line separator). This assumes simple data without nested comma's. I was doing this with an unprintable field separator (\x1d) so this wasn't an issue for me.
See http://perldoc.perl.org/perlrun.html#Command-Switches for more details.
Upvotes: 2
Reputation:
Went looking didn't find a nice csv compliant filter program thats flexible to be useful for than just a one-of, so I wrote one. Enjoy.
Basic usage is:
bash$ csvfilter [-r <columnTitle>]* [-quote] <csv.file>
#!/usr/bin/perl use strict; use warnings; use Getopt::Long; use Text::CSV; my $always_quote=0; my @remove; if ( ! GetOptions('remove:s'=> \@remove, 'quote-always'=>sub {$always_quote=1;}) ) { die "$0:invalid option (use --remove [--quote-always])"; } my @cols2remove; sub filter(@) { my @fields=@_; my @r; my $i=0; for my $c (@cols2remove) { my $p; #if ( $i $i ) { push(@r, splice(@fields, $i)); } return @r; } # create just one if these my $csvOut=new Text::CSV({always_quote=>$always_quote}); sub printLine(@) { my @fields=@_; my $combined=$csvOut->combine(filter(@fields)); my $str=$csvOut->string(); if ( length($str) ) { print "$str\n"; } } my $csv = Text::CSV->new(); my $od; open($od, "| cat") || die "output:$!"; while () { $csv->parse($_); if ( $. == 1 ) { my $failures=0; my @cols=$csv->fields; for my $rm (@remove) { for (my $c=0; $c$b} @cols2remove); } printLine($csv->fields); } exit(0); \
Upvotes: 1
Reputation: 32070
My personal favorite way to do CSV is using the AnyData module. It seems to make things pretty simple, and removing a named column can be done rather easily. Take a look on CPAN.
Upvotes: 0
Reputation: 4510
In addition to what people here said about processing comma-separated files, I'd like to note that one can extract the even (or odd) array elements using an array slice and/or map:
@myarray[map { $_ * 2 } (0 .. 4)]
Hope it helps.
Upvotes: 0
Reputation: 182782
Have a look at Text::CSV (or Text::CSV_XS) to parse CSV files in Perl. It's available on CPAN or you can probably get it through your package manager if you're using Linux or another Unix-like OS. In Ubuntu the package is called libtext-csv-perl.
It can handle cases like fields that are quoted because they contain a comma, something that a simple split command can't handle.
Upvotes: 20
Reputation: 7712
This answers a much larger question, but seems like a good relevant bit of information.
The unix cut command can do what you want (and a whole lot more). It has been reimplemented in Perl.
Upvotes: -3
Reputation: 3424
Alternatively, you could use Text::ParseWords, which is in the standard library. Add
use Text::ParseWords;
to the top of Pax's example above, and then substitute
my @fields = parse_line(q{,}, 0, $_);
for the split.
Upvotes: 2
Reputation: 5218
If you are talking about CSV files in windows (e.g., generated from Excel), you will need to be careful to take care of fields that contain comma themselves but are enclosed by quotation marks.
In this case, a simple split won't work.
Upvotes: 4
Reputation: 177594
CSV is an ill-defined, complex format (weird issues with quoting, commas, and spaces). Look for a library that can handle the nuances for you and also give you conveniences like indexing by column names.
Of course, if you're just looking to split a text file by commas, look no further than @Pax's solution.
Upvotes: 6
Reputation: 881523
Use split to pull the line apart then output the ones you want (say every second column), create the following xx.pl file:
while(<STDIN>) {
chomp;
@fields = split (",",$_);
print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}
then execute:
$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10
Upvotes: 5