Reputation: 4068

Getting a value expressed with the memory unit

I'm searching for a way to reduce the following piece of code to a single regexp statement:

if( $current_value =~ /(\d+)(MB)*/ ){
        $current_value = $1 * 1024 * 1024;
    }
    elsif( $current_value =~ /(\d+)(GB)*/ ){
        $current_value = $1 * 1024 * 1024 * 1024;
    }
    elsif( $current_value =~ /(\d+)(KB)*/ ){
        $current_value = $1 * 1024;
    }

The code performs an evaluation of the value that can be expressed as a single number (bytes), a number and KB (kilobytes), with megabytes (MB) and so on. How do I reduce the block of code?

Upvotes: 3

Answers (5)

toolic

Reputation: 62236

Number::Format

use warnings;
use strict;

use Number::Format qw(format_bytes);
print format_bytes(1024), "\n";
print format_bytes(2535116549), "\n";

Output:

    1K
    2.36G

Upvotes: 5

Brad Gilbert

Reputation: 34130

There is a problem with using KB for 1024 bytes. Kilo as a prefix generally means 1000 of a thing not 1024.

The problem gets even worse with MB since it has meant 1000*1000, 1024*1024, and 1000*1024.

A 1.44 MB floppy actually holds 1.44 * 1000 * 1024.

The only real way out of this is to use the new KiB (Kibibyte) to mean 1024 bytes.

The way you implemented it also has the limitation that you can't use 8.4Gi to mean 8.4 * 1024 * 1024. To remove that limitation I used $RE{num}{real} from Regexp::Common instead of \d+.

Some of the other answers hardwire the match by writing out all of the possible matches. That can get very tedious, not to mention error prone. To get around that I used the keys of %multiplier to generate the regex. This means that if you add or remove elements from %multiplier you won't have to modify the regex by hand.

use strict;
use warnings;
use Regexp::Common;

my %multiplier;
my $multiplier_match;
{

  # populate %multiplier
  my %exponent = (
    K => 1, # Kilo  Kibi
    M => 2, # Mega  Mebi 
    G => 3, # Giga  Gibi
    T => 4, # Tera  Tebi
    P => 5, # Peta  Pebi
    E => 6, # Exa   Exbi
    Z => 7, # Zetta Zebi
    Y => 8, # Yotta Yobi
  );
  while( my ($str,$exp) = each %exponent ){
    @multiplier{ $str,      "${str}B"  } = (1000 ** $exp) x2; # K  KB
    @multiplier{ "${str}i", "${str}iB" } = (1024 ** $exp) x2; # Ki KiB
  }
  # %multiplier now holds 32 pairs (8*4)

  # build $multiplier_match
  local $" #" # fix broken highlighting
    = '|';
  my @keys = keys %multiplier;
  $multiplier_match = qr(@keys);

}

sub remove_multiplier{
  die unless @_ == 1;
  local ($_) = @_;

  #  s/^($RE{num}{real})($multiplier_match)$/ $1 * $multiplier{$2} /e;
  if( /^($RE{num}{real})($multiplier_match)$/ ){
    return $1 * $multiplier{$2};
  }

  return $_;
}

If you absolutely need 1K to mean 1024 then you only need to change one line.

# @multiplier{ $str, "${str}B"  } = (1000 ** $exp) x2; # K  KB
  @multiplier{ $str, "${str}B"  } = (1024 ** $exp) x2; # K  KB

Note that since I used $RE{num}{real} from Regexp::Common it will also work with 5.3e1Ki.

Upvotes: 0

benzado

Reputation: 84348

You could set up a hash like this:

my %FACTORS = ( 'KB' => 1024, 'MB' => 1024**2, 'GB' => 1024**3 );

And then parse the text like this:

if ( $current_value =~ /(\d+)(KB|MB|GB)/ ) {
    $current_value = $1 * $FACTORS{$2};
}

In your example the regex has a * which I'm not sure you intend, because * means "zero or more" and so (+\d)(MB)* would match 10 or 10MB or 10MBMB or 10MBMBMBMBMBMBMB.

Upvotes: 4

zgpmax

Reputation: 2857

You can do it in one regexp, by putting code snippits inside the regexp to handle the three cases differently

my $r;

$current_value =~ s/
    (\d+)(?:
          Ki (?{ $r = $^N * 1024 })
        | Mi (?{ $r = $^N * 1024 * 1024 })
        | Gi (?{ $r = $^N * 1024 * 1024 * 1024 })
    )/$r/xso;

Upvotes: 1

Konerak

Reputation: 39773

Using benzado's modified code, here is a test you can run to see if it works.

We advise you to always put code like this in a reusable method, and write a small unit-test for it:

use Test::More;

plan tests => 4;

##
# Convert a string denoting '50MB' into an amount in bytes.
my %FACTORS = ( 'KB' => 1024, 'MB' => 1024*1024, 'GB' => 1024*1024*1024 );
sub string_to_bytes {
        my $current_value = shift;

        if ( $current_value =~ /(\d+)(KB|MB|GB)/ ) {
            $current_value = $1 * $FACTORS{$2};
        }
        return $current_value;
}

my $tests = {
        '50' => 50,
        '52KB' => 52*1024,
        '55MB' => 55*1024*1024,
        '57GB' => 57*1024*1024*1024
};

foreach(keys %$tests) {
        is( string_to_bytes($_),$tests->{$_},
            "Testing if $_ becomes $tests->{$_}");
}

Running this gives:

$ perl testz.pl
1..4
ok 1 - Testing if 55MB becomes 57671680
ok 2 - Testing if 50 becomes 50
ok 3 - Testing if 52KB becomes 53248
ok 4 - Testing if 57GB becomes 61203283968

Now you can

Add more testcases (what happens with BIG numbers? What do you want to happen? What for undef, for strings, when kB is written with small k, when you encounter kibiB or kiB or Kb?)
Turn this into a module
Write documentation in POD
Upload the Module to CPAN

And voilá!

Upvotes: 1

Getting a value expressed with the memory unit

Answers (5)

Related Questions