Anurag Sharma
Anurag Sharma

Reputation: 5039

Extract Numbers and Size Information (KB, MB, etc) from a String in Python

I have a string like this

"44MB\n" (it can be anything ranging from 44mb, 44 MB, 44 kb, 44 B)

I want to separate 44 and MB from the above string. I have written this code to extract the number

import re
mystring = "44MB\n"
re.findall(r'\d+', mystring)

for extracting the size I want to avoid using if statements like

if "kb" mystring.lower(): 
    # Do stuffs
if .......

How can I extract the size info using regex

Upvotes: 2

Views: 2980

Answers (3)

ojii
ojii

Reputation: 4781

This script:

import re


test_string = '44.5MB\n12b\n6.5GB\n12pb'

regex = re.compile(r'(\d+(?:\.\d+)?)\s*([kmgtp]?b)', re.IGNORECASE)

order = ['b', 'kb', 'mb', 'gb', 'tb', 'pb']

for value, unit in regex.findall(test_string):
    print(int(float(value) * (1024**order.index(unit.lower()))))

Will print:

46661632
12
6979321856
13510798882111488

Which is the sizes it found in bytes.

Upvotes: 5

Pranava Sheoran
Pranava Sheoran

Reputation: 529

The following regex should validate the size strings which you are trying to match:

my_string = "44MB\n"
match_Obj = re.match(r'^(\d*)\s?([kmKM][Bb])$', my_string)

print "size: ", match_Obj.group(1)
print "units: ", match_Obj.group(2)

Output:

size: 44
units: MB

Here is a link where you can test this regex:

Regex101

Upvotes: 1

donkopotamus
donkopotamus

Reputation: 23206

You could use a regex like the following to search for both size and unit (kb, mb)

re.compile(r"(?i)(?P<size>\d+)\s*(?P<unit>[km]?b)")

Trying it out:

>>> rgx = re.compile(r"(?i)(?P<size>\d+)\s*(?P<unit>[km]?b)")
>>> for x in ("44 mb", "44mb", "44kB"):
...     print(rgx.search(x).groups())
... 
('44', 'mb')
('44', 'mb')
('44', 'kB')

For dealing with other prefixes, just alter the unit portion of the regex.

Its worth noting, since you say case doesn't matter, the "kb" is a valid symbol for kilobit, rather than kilobyte ...

Upvotes: 0

Related Questions