Alex
Alex

Reputation: 44385

Random data generator matching a regex in python

In python, I am looking for python code which I can use to create random data matching any regex. For example, if the regex is

\d{1,100}

I want to have a list of random numbers with a random length between 1 and 100 (equally distributed)

There are some 'regex inverters' available (see here) which compute ALL possible matches, which is not what I want, and which is extremely impracticable. The example above, for example, has more then 10^100 possible matches, which never can be stored in a list. I just need a function to return a match by random.

Maybe there is a package already available which can be used to accomplish this? I need a function that creates a matching string for ANY regex, not just the given one or some other, but maybe 100 different regex. I just cannot code them myself, I want the function extract the pattern to return me a matching string.

Upvotes: 5

Views: 5376

Answers (3)

Brad Schoening
Brad Schoening

Reputation: 1381

Two Python libraries can do this: sre-yield and Hypothesis.

  1. sre-yield

sre-yeld will generate all values matching a given regular expression. It uses SRE, Python's default regular expression engine.

For example,

import sre_yield
list(sre_yield.AllStrings('[a-z]oo$'))
['aoo', 'boo', 'coo', 'doo', 'eoo', 'foo', 'goo', 'hoo', 'ioo', 'joo', 'koo', 'loo', 'moo', 'noo', 'ooo', 'poo', 'qoo', 'roo', 'soo', 'too', 'uoo', 'voo', 'woo', 'xoo', 'yoo', 'zoo']

For decimal numbers,

list(sre_yield.AllStrings('\d{1,2}'))
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
  1. Hypothesis

The unit test library Hypothesis will generate random matching examples. It is also built using SRE.

import hypothesis
g=hypothesis.strategies.from_regex(r'^[A-Z][a-z]$')
g.example()

with output such as:

'Gssov', 'Lmsud', 'Ixnoy'

For decimal numbers

d=hypothesis.strategies.from_regex(r'^[0-9]{1,2}$')

will output one or two digit decimal numbers: 65, 7, 67 although not evenly distributed. Using \d yielded unprintable strings.

Note: use begin and end anchors to prevent extraneous characters.

Upvotes: 4

Jakub M.
Jakub M.

Reputation: 33857

If the expressions you match do not have any "advanced" features, like look-ahead or look-behind, then you can parse it yourself and build a proper generator

Treat each part of the regex as a function returning something (e.g., between 1 and 100 digits) and glue them together at the top:

import random
from string import digits, uppercase, letters

def joiner(*items):
    # actually should return lambda as the other functions
    return ''.join(item() for item in items)  

def roll(item, n1, n2=None):
    n2 = n2 or n1
    return lambda: ''.join(item() for _ in xrange(random.randint(n1, n2)))

def rand(collection):
    return lambda: random.choice(collection)

# this is a generator for /\d{1,10}:[A-Z]{5}/
print joiner(roll(rand(digits), 1, 10),
             rand(':'),
             roll(rand(uppercase), 5))

# [A-C]{2}\d{2,20}@\w{10,1000}
print joiner(roll(rand('ABC'), 2),
             roll(rand(digits), 2, 20),
             rand('@'),
             roll(rand(letters), 10, 1000))

Parsing the regex would be another question. So this solution is not universal, but maybe it's sufficient

Upvotes: 2

John Jiang
John Jiang

Reputation: 11499

From this answer

You could try using python to call this perl module:

https://metacpan.org/module/String::Random

Upvotes: 0

Related Questions