DanH
DanH

Reputation: 5818

Perl: Is this good or bad regex, and how to improve it?

I'm trying to capture the temperature output of sensors, for which I have the following relevant lines:

temp1:       +39.5 C  (crit = +105.0 C)
Core 0:      +40.0 C  (high = +100.0 C, crit = +100.0 C)
Core 1:      +40.0 C  (high = +100.0 C, crit = +100.0 C)

I only need the first temperature of each line (39.5, 40.0, 40.0). The issue of course is that I can't really on the word number since there's an extra space in "Core 0" / "Core 1".

I've come up with the following regex which does work, however I've been told that use of .* is a somewhat lazy and dirty approach to regex.

$core_data =~ s/^.*\+(.*)C\ .*$/$1/g;

I was wondering, is there a tighter or better way to accomplish this or am I doing OK?

Upvotes: 6

Views: 380

Answers (6)

David Raab
David Raab

Reputation: 4488

I would write a general function that parses the input and returns you a hash. In General i would use this regex:

m/\A ([^:]+) : \s+ ([+-][0-9.]+) /xms

This matches a line. In $1 is what matches (i.e: "Core 0") and in $2 the temperature. I also would do a conversion from string to a number it would end with something like this:

my $temp_string = q{
temp1:       +39.5 C  (crit = +105.0 C)
Core 0:      +40.0 C  (high = +100.0 C, crit = +100.0 C)
Core 1:      +40.0 C  (high = +100.0 C, crit = +100.0 C)
Core 2:      -40.0 C  (high = +100.0 C, crit = +100.0 C)
};

my $temps = parse_temps($temp_string);

print "temp1:  ", $temps->{temp1}, "\n";
print "Core 0: ", $temps->{core0}, "\n";
print "Core 1: ", $temps->{core1}, "\n";
print "Core 2: ", $temps->{core2}, "\n";


sub parse_temps {
    my ( $str ) = @_;
    my %temp;
    for my $line ( split /\n/, $str ) {
        if ( $line =~ m/\A ([^:]+) : \s+ ([+-][0-9.]+) /xms ) {
            my $key   = $1;
            my $value = $2;

            $key   =~ s/\s+//g;
            $temp{ lc $key } = 0+$value;
        }
    }
    return wantarray ? %temp : \%temp;
}

The output of the program:

temp1:  39.5
Core 0: 40
Core 1: 40
Core 2: -40

Upvotes: 1

Toto
Toto

Reputation: 91430

A more concise regex

/\+(\d+\.?\d*) C/

this will match the first temperature with optional decimal value.

#!/usr/bin/perl
use strict;
use warnings;

my $re = qr{\+(\d+\.?\d*) C};
while (my $line = <DATA>){
    $line =~$re and print $1,"\n";
}
__DATA__
temp1:       +39.5 C  (crit = +105.0 C)
Core 0:      +40.0 C  (high = +100.0 C, crit = +100.0 C)
Core 1:      +40.0 C  (high = +100.0 C, crit = +100.0 C)

output:

39.5
40.0
40.0

Upvotes: 7

TLP
TLP

Reputation: 67908

This looks more suited to a split than a regex. split will clear all the unnecessary whitespace automatically, and you do not need to plan ahead for changes in the data.

my $tag;
($tag, $core_data) = split (/:/, $core_data);
my @fields = split (/\s/, $core_data);
my $temp   = $fields[0];

That will store the strings "+39.5", and "+40.0" in the different example lines, which can be converted to a number automagically, I believe.

Also, you will have easy access to the label of the line in $tag.

If you want, you can chop off the added info inside the parentheses with a regex:

if ($core_data =~ s/\(([^\)]*)\)//) {
    my $tmp = $1;
    $tmp =~ s/[\s\+C]//g; # clear away junk
    %data = split (/=/, (split (/,/, $tmp)));
}
for my $key (keys %data) {
    printf "%-7s = %s\n", $key, $data{$key};
}

Upvotes: 2

CrystaLight
CrystaLight

Reputation: 56

I don't understand why you're doing a search and replace with your regex (s///g), if you're just trying to capture the first temperature. Your regex appears to rely on .* being greedy. Assuming you can rely on the name: temp C (... format, this regex will work without having to match the whole string:

$core_data =~ m/^(?:\w*\b)*:\s*(\+?\d+\.\d+)/;

... or to capture without the + in front:

$core_data =~ m/^(?:\w*\b)*:\s*\+?(\d+\.\d+)/;

Upvotes: 3

sergio
sergio

Reputation: 69027

IMHO, .* is perfectly fine when it makes sense, although when when you can narrow it down to something more specific, then all the better.

In your case, you could say

S/^[^+]+\+([0-9.]) C.*$/$1/g

In this regex, I focus on what I am looking for and characterize the temperature as a sequence of digits with a point somewhere while the rest is just not relevant to me. Since you have two temperatures in each line, and you only want the first one, I used [^+] at the beginning, which matches everything that is not a +, so it will stop right where the first temperature starts. once I got the temperature, I gobble everything out using .* up to the end of the line.

This is just an example of reasoning, it does not pretend to be the best regex you can come up with to solve your problem.

Upvotes: 2

VGE
VGE

Reputation: 4191

A more precise regex

 $core_data =~ s/^.*\+([\d.]+ )C\ .*$/$1/g;

But probably the following is enough because only the numeical value seems to be interesting.

 $cpu_head = $1 if m/:\s*\+([\d.]+) C/;

Note : \s stands for any space and \d for any digit.

Upvotes: 2

Related Questions