Nick
Nick

Reputation: 1629

Sampling without replacement from an unknown set

I have to extract uniformly DFAs (deterministic finite automata) from a set of DFA that I call S. It seems a simple problem, but I haven't the set S. S contains all the DFA of dimension n , thus I know dimension of S, I can build S but I can't because is very large. I know also the dimension of sets Sm where for example S3 is a subset of S and S3 contains all the DFA with 3 states, Sm contains all the DFA with m states where m < n .

I have not the set S and thus I have to simulate a uniformly sampling. Furthermore I have to do sampling without replacement. I create a set D={1,2,3........n} and for each value, that I call i, in D I associate the value |Si|/|S| where | | indicates the number of elements in the set that is the argument. Namely I created a distribution. Now I can extract a value from D according to this distribution. In this way I found the set from which extract a single DFA. For example if from D I extract 4 then I have to extract uniformly a DFA from S4.

But my question is, how can I sample a DFA from Si (S4 in the example above) without replacement? Namely if I have already extracted previously a specific DFA, in the next sampling I must avoid that specific DFA. Remark that a DFA is a matrix, a table (a bidimensional array). Remark also that extract a specific DFA means uniformly extract for each cell of the above mentioned table a value in {1,.....,k} where k is the number of the elements of the alphabet (you have to extract also for each state if is accepting or rejecting).

(I have to implement in C++11 but this is pretty irrelevant)

Upvotes: 1

Views: 100

Answers (1)

Lior Kogan
Lior Kogan

Reputation: 20648

If i understand your problem correctly, the trivial solution would be to keep every sampled DFA, and upon generating a new random one - check if it has been generated before. I guess your problem is the large amount of memory required to store them all.

If so, you may keep only the hash of each DFA - e.g. a 128-bit MurmurHash3, and compare the hash of newly generated DFAs with the stored hashes.

Upvotes: 2

Related Questions