ZeldaElf
ZeldaElf

Reputation: 333

Perl and Regex - Parsing values from a .csv

I need to create a perl script that reads the last modified file in a given folder (the file is always a .csv) and parses the values from their columns, so I can control them to a mysql database.

The main problem is: I need to separate the Date from the Hours, and the Country from the Names(CHN, DEU and JPN represent China, Deutschland and Japan).

They come together like in the example below:

"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

So far I can split the lines, but how can I make it understand that each value into "" and separated by , should be inserted into my arrays?

my %date;
my %hour;
my %country;
my %name;
my %percentage_one;
my %percentage_two;

# Selects lastest file in the given directory
my $files = File::DirList::list('/home/cvna/IN/SCRIPTS/zabbix/roaming/tratamento_IAS/GPRS_IN', 'M');
my $file = $files->[0]->[13];

open(CONFIG_FILE,$file);
while (<CONFIG_FILE>){
    # Splits the file into various lines
    @lines = split(/\n/,$_);
    # For each line that i get...
    foreach my $line (@lines){
        # I need to split the values between , without the ""
        # And separating Hour from Date, and Name from Country
        @aux = split(/......./,$line)
    }
}
close(CONFIG_FILE);

Upvotes: 2

Views: 97

Answers (2)

David W.
David W.

Reputation: 107040

Looking at your code, it appears you're pretty new to Perl. The Text::CSV module is a nice solution, but unfortunately, isn't a standard module. You'll need to use CPAN to install it. It isn't difficult, but may require you to be the administrator of your computer.

The module Text::ParseWords is a standard module and can handle quoted words much like Text::CSV can.

You'll need to basically split the line (which I do with the parse_linefunction). The first parameter is , which is what I want to split my line upon. Unlike split itself, parse_line doesn't split on the parameters that are quoted, and handles backticked quotes. This is very similar to Text::CSV.

Once you've split your line, you'll need to split date from time and country from name. In my example, I show two ways of doing this: One uses split and the other uses a matching regular expression. Either one will work.

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);    # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;

while ( my $line = <DATA> ) {
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
            = parse_line ',', 0, $line;
    my ($date, $time) = split /\s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
    say "$date, $time, $country, $name";
}

__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

In your actual program, you'll open your file, and make sure you've opened that file. You can test for that, or use autodie:

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);    # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
use autodie;

open my $config_file, "<", $file;  # No need for testing thanks to use autodie!

# What you need to do if you don't use autodie
# open my $config_file, "<", $file or die qq(Can't open "$file" for reading);

while ( my $line = <$config_file> ) {
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)  
            = parse_line ',', 0, $line;
    my ($date, $time) = split /\s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
    say "$date, $time, $country, $name";  # Show fields were correctly parsed.
}

It looks like you want to store the data, I see you have multiple hashes that I bet you're trying to keep in parallel. Take a look at how you can use references that allows you to build more complex structures:

my %data;   #Where I'll be storing the data...
$data{$key}->{DATE} = $date;
$data{$key}->{HOUR} = $hour;
$data{$key}->{COUNTRY} = $country;
...

Now, all of your data is in %data. You can pass it around from place to place in your program, and not worry whether you've updated each and every single hash.

Once you get the hang of references, you are on your way to writing Object Oriented Perl code.

Get a good book on Modern Perl too. Perl coding techniques have changed quite a bit since Perl 5 was released. Unfortunately, most people never learn the way Perl should be written because they learn from old books that are lying around, or from looking at older code written in the Perl 3 and Perl 4 error (pun intended). Perl is a flexible and powerful language that allows you to quickly generate yourself enough rope to hang yourself. Learning good programming techniques will allow you to write more complex and comprehensive programs that are actually easier to read and maintain.


Almost complete program...

Here's the complete program that finds the most recent file in a particular directory, then reads in that file and parses the lines.

I'm using -M file test. This file test returns the last modification time of the file as expressed as the age of the file in days since the program ran. For example, a file that was last modified 2 1/2 days ago will return 2.5 while a file last modified one day and four hours ago will return 1.16666667. You can use this to compare the age of the various files.

This program does works for Perl 5.8.8 without installing any new modules, and I've tested it with data I've made up.

You can see I use "open ... or die ...; without any issues. Are you getting some other error? Do you have use strict; and use warnings; set in your program?

#! /usr/bin/env perl
#

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use Text::ParseWords;
use Benchmark;

use constant {
    DATA_FILE_DIR => "temp",
};

#
# Find newest file in the directory
#

opendir my $data_dir, DATA_FILE_DIR
        or die qq(Cannot open directory for reading.);

my $newest_file;
while ( my $file = readdir $data_dir ) { 
    next if $file eq "." or $file eq "..";
    my $full_name = DATA_FILE_DIR . "/" . $file;
    if ( not defined $newest_file
            or -M $full_name < -M $newest_file ) {
        $newest_file = $full_name;
    }
}
print qq(Using file is "$newest_file"\n);
closedir $data_dir;

open my $file, "<", $newest_file
        or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {
    # Read in the entire line
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2) 
            = parse_line ',', 0, $line;
    # Split the DATE/TIME field
    my ($date, $time) = split /\s+/, $date_time;

    # Split the Country/Name field
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;

    # Print statement merely shows that these four fields are truly split.
    print "$date, $time, $country, $name\n";
}

Upvotes: 1

choroba
choroba

Reputation: 241868

readline or <> only reads one line. There's no need to split it on newlines. But, instead of fixing your code, use Text::CSV:

#!/usr/bin/perl
use 5.010;
use warnings;
use strict;

use Text::CSV;

my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;

while (my $row = $csv->getline(*DATA)) {
    my ($date, $time)    = split / /,   $row->[0];
    my ($country, $name) = split / - /, $row->[3];
    print "Date: $date\tTime: $time\tCountry: $country\tName: $name\n";
}

__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

Upvotes: 5

Related Questions