Reputation: 1057
I'm looking to convert some strings in a CSV which are in 0000-2400 hour format to 00-24 hour format. e.g.
2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00
The 7th and 9th columns are departure and arrival times, respectively. Preferably the lines should look like this when I'm done:
2011-01-01,"AA",12478,31703,12892,32575,"09",-4.00,"12",-26.00,2475.00
The whole csv will eventually be imported into R and I want to try and handle some of the processing beforehand because it will be kinda large. I initially attempted to do this with Perl but I'm having trouble picking out multiple digits w/ a regex. I can get a single digit before a given comma with a lookbehind expression, but not more than one.
I'm also open to being told that doing this in Perl is needlessly silly and I should stick to R. :)
Upvotes: 3
Views: 647
Reputation: 67940
Like I mentioned in the comments, using a CSV module like Text::CSV is a safe option. This is a quick sample script of how its used. You'll notice that it does not preserve quotes, though it should, since I put in keep_meta_info
. If it's important to you, I'm sure there's a way to fix it.
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
eol => $/,
keep_meta_info => 1,
});
while (my $row = $csv->getline(*DATA)) {
for ($row->[6], $row->[8]) {
s/\d\d\K\d\d//;
}
$csv->print(*STDOUT, $row);
}
__DATA__
2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00
Output:
2011-01-01,AA,12478,31703,12892,32575,09,-4.00,12,-26.00,2475.00
2011-01-02,AA,12478,31703,12892,32575,09,-2.00,12,1.00,2475.00
2011-01-03,AA,12478,31703,12892,32575,09,-3.00,12,4.00,2475.00
Upvotes: 2
Reputation: 126772
I may as well offer my own solution to this, which is
s/"(\d\d)\d\d"/"$1"/g
Upvotes: 3