Huy Nguyen
Huy Nguyen

Reputation: 106

Regex Crafting SMART data

I'm racking my brain trying to come up with a regex that will be able to pull the data I want in this SMART data output:

Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  139) seconds.
Offline data collection
capabilities:            (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 100) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x1081) SCT Status supported.

The regex I've come up with so far is:

/([^A-Za-z]?:)([\w\s\/().\-]+\.)/gm

The objective of my regex is to get the "Values" of each "General SMART Values" from smartctl -a output. The problem is that the output is formatted in a particular way that's making it difficult for me to pull the values I want into an array.

I'm able to pull just the SMART Values Keys such as Offline data collection status, or Self-test execution status, so now I'm working on pull the values of each of those parameters. Which would be something like (139) seconds or (0x00) Offline data collection activity was never started.

What separates the key from value is this colon followed by some white spaces. However in one of the values it contains text that also has a colon in it which is making the capturing extremely difficult. I need to capture all of the following without accidentally capturing the next parameter values.

Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  139) seconds.

So from the above I need to capture just the following.

(0x00)  Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.

Without going in and capturing Self-test execution status: as part of it as that is the next parameter key.

Any help of thoughts to this situation would be helpful.

Upvotes: 0

Views: 100

Answers (3)

cdlane
cdlane

Reputation: 41872

Both keys and data are split across lines so we have to handle both cases:

use strict;
use warnings;

my %data;

my $lastkey;

my $prefixkey = "";

while (my $smartdata = <DATA>) {
    chomp $smartdata;

    if ($smartdata =~ m/^\S/) {
        if ($smartdata =~ m/^([^:]+):\s+(.*)$/) { # is a complete or end of a key and data

            $lastkey = $prefixkey ? "$prefixkey $1" : $1;

            $data{$lastkey} = $2;

            $prefixkey = "";
        }
        else { # this is the start of a key
            $smartdata =~ s/(^\s+|\s+$)//; # strip whitespace
            $prefixkey = $smartdata;
        }
    }   
    else { # this is a data continuation
        $smartdata =~ s/(^\s+|\s+$)//; # strip whitespace
        $data{$lastkey} .= " $smartdata";
    }
}

for my $key (keys(%data)) {
    print("$key:\t$data{$key}\n");
}

__DATA__
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  139) seconds.
Offline data collection
capabilities:            (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 100) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x1081) SCT Status supported.

Produces:

Error logging capability:   (0x01) Error logging supported. General Purpose Logging supported.
Total time to complete Offline data collection: (  139) seconds.
SCT capabilities:   (0x1081) SCT Status supported.
Offline data collection capabilities:   (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.
Conveyance self-test routine recommended polling time:  (   3) minutes.
Self-test execution status: (   0) The previous self-test routine completed without error or no self-test has ever  been run.
Extended self-test routine recommended polling time:    ( 100) minutes.
Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled.
Short self-test routine recommended polling time:   (   2) minutes.

Upvotes: 1

user557597
user557597

Reputation:

I think you could leverage on the fact that the keys start at the beginning
of line and the value's always have at least a horizontal whitespace
before each one.

(?m)((?:^(?!\s)[^:\n]*\n?)+):(\h+.*?(?:\n|\z)(?:^\h+.*?(?:\n|\z))*)?

Don't need modifiers it's included.

while ( $smartdata =~ /(?m)((?:^(?!\s)[^:\n]*\n?)+):(\h+.*?(?:\n|\z)(?:^\h+.*?(?:\n|\z))*)?/g )
{
    push @key, $1;
    push @value, $2;
}

Expanded

 (?m)
 (                             # (1 start), Key
      (?:
           ^ 
           (?! \s )
           [^:\n]* 
           \n? 
      )+
 )                             # (1 end)
 : 
 (                             # (2 start), Value
      \h+ .*?  
      (?: \n | \z )
      (?:
           ^ \h+ .*?  
           (?: \n | \z )
      )*
 )?                            # (2 end)

Perl sample

use strict;
use warnings;

$/ = undef;

my $smartdata = <DATA>;

my @key = ();
my @val = ();

while ( $smartdata =~ /(?m)((?:^(?!\s)[^:\n]*\n?)+):(\h+.*?(?:\n|\z)(?:^\h+.*?(?:\n|\z))*)?/g )
{
    push @key, $1;
    if (defined $2 ) {
        push @val, $2;
    }
    else {
        push @val, '';
    }
}

for ( 0 .. ($#key-1) )
{
     print "key $_ = $key[$_]\n";
     print "value = $val[$_]\n-------------------\n";
}

__DATA__

Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  139) seconds.
Offline data collection
capabilities:            (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.



Extended self-test routine
recommended polling time:    ( 100) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x1081) SCT Status supported.

Output

key 0 = Offline data collection status
value =   (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.

-------------------
key 1 = Self-test execution status
value =       (   0) The previous self-test routine completed
                    without error or no self-test has ever
                    been run.

-------------------
key 2 = Total time to complete Offline
data collection
value =         (  139) seconds.

-------------------
key 3 = Offline data collection
capabilities
value =             (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.

-------------------
key 4 = SMART capabilities
value =             (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.

-------------------
key 5 = Error logging capability
value =         (0x01) Error logging supported.
                    General Purpose Logging supported.

-------------------
key 6 = Short self-test routine
recommended polling time
value =     (   2) minutes.

-------------------
key 7 = Extended self-test routine
recommended polling time
value =     ( 100) minutes.

-------------------
key 8 = Conveyance self-test routine
recommended polling time
value =     (   3) minutes.

-------------------

Upvotes: 2

Matt Jacob
Matt Jacob

Reputation: 6553

The format of this data isn't the greatest, but at least it's predictable. We can parse it according to what the beginning of each line looks like.

use strict;
use warnings;
use Data::Dumper;

my %data;
my $key;
my $record;

while (<DATA>) {
    chomp;

    if (s/^\s+/ /g) {
        $record .= $_;
    } elsif (s/^([^:]+):\s\s+//) {
        if (length($record)) {
            $data{$key} = $record;
            $key = '';
        }

        $key .= $1;
        $record = $_;
    } else {
        $data{$key} = $record;
        $key = $_ . ' ';
        $record = '';
    }
}

$data{$key} = $record;
print Dumper(\%data);

__DATA__
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (  139) seconds.
Offline data collection
capabilities:            (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 100) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x1081) SCT Status supported.

Output:

$VAR1 = {
          'Error logging capability' => '(0x01) Error logging supported. General Purpose Logging supported.',
          'Total time to complete Offline data collection' => '(  139) seconds.',
          'SCT capabilities' => '(0x1081) SCT Status supported.',
          'Offline data collection capabilities' => '(0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.',
          'SMART capabilities' => '(0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.',
          'Conveyance self-test routine recommended polling time' => '(   3) minutes.',
          'Self-test execution status' => '(   0) The previous self-test routine completed without error or no self-test has ever been run.',
          'Extended self-test routine recommended polling time' => '( 100) minutes.',
          'Offline data collection status' => '(0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled.',
          'Short self-test routine recommended polling time' => '(   2) minutes.'
        };

Upvotes: 0

Related Questions