Kleene Closure for infinite subset

Let L = {aⁿ | n >= 0}, where enter image description here and for all n >= 1.

Thus L consists of sequences of a of all lengths, including a sequence of length 0. Let L2 be any infinite subset of L. I need to show there always exists a DFA to recognize (L2)*.

If L2 is a finite subset it is very obvious as L2 would be a DFA and hence by kleene closure L2* would be recognized by a DFA. But I am unable to get it for infinite subset as L2 may not be expressed as DFA for eg length of strings is prime.

Upvotes: 0

Answers (1)

nhahtdh

Reputation: 56809

Approach

While there exists a DFA to describe a set L of all strings aⁿ, n >= 0, there is no guarantee that a DFA exists for all subsets of L. The subset of L which contains all strings whose length is prime, as you have mentioned, is one example where a DFA the describes the language does not exist.

The correct approach would be to directly prove that (L')* is a regular language for any subset L' of L.

Definition

Let us define GCD(K) = GCD_{w ∈ K, |w| > 0} (|w|), where K is any non-empty subset of L. We can now refer to the greatest common divisor of all the lengths of all non-empty words in a language K as GCD(K). This definition applies for both finite and infinite subset of L.

Similarly, we can define LCM(K) = LCM_{w ∈ K, |w| > 0} (|w|), where K is any non-empty and finite subset of L.

Proof

We will try to prove that when GCD(L') = 1, there exists a number M such that all string aⁿ, n >= M belongs to the language (L')*. This leads to (L')* being a regular language, since we can construct a regular expression of the form:

All strings of length less than M and belongs to (L')*
OR
All strings of length more than or equal to M

The regular expression above has a corresponding DFA which has M + 1 states.

When GCD(L') > 1, we can reduce the problem to the case of GCD = 1 by "dividing" all words in the subset L' by GCD(L').

If GCD(L') = 1 (set-wise coprime), there exists a finite subset S of L' where GCD of the length of all strings in S is also 1.

We can prove the claim above by construction.

Pick any element w₁ from L', |w₁| > 0 and construct set S₁ = {w₁}

If GCD(S_n) = 1, S_n is the set we want to find.

If GCD(S_n) > 1, pick an element w_n+1 from L' and construct set S_n+1 = {w_n+1} ∪ S_n, so that
GCD(S_n+1) < GCD(S_n)

If GCD(S_n) > 1, there always exists an element from set L' that decreases the GCD when we add it to the set; otherwise, the GCD of the set L' cannot be 1. And since the length of the first element w₁ has finite number of factors, the size of the final set S is finite.

Back to the problem, for any subset L' of L, we can find a finite subset S of L', which satisfies GCD(L') = GCD(S). From the set S, we can construct a generalized linear Diophantine equation with |S| unknowns a_i:

a₁|w₁| + a₂|w₂| + ... + a_|S||w_|S|| = c where c is a non-negative integer

Since GCD(S) = 1, the equation above is always solvable, by recursively applying the solution to the simplest form of linear Diophantine equation ax + by = c.

Solve the generalized Diophantine equations above for c = 0 to (LCM(S) - 1). The solutions (a₁, a₂, ..., a_|S|) can contain negative numbers. However, we can keep adding multiples of LCM(S) to both sides of the equations until all the solutions contain only non-negative numbers.

Let k be the smallest multiple of LCM(S) so that all Diophantine equations for c = k * LCM(S) + q, q = 0 to (LCM(S) - 1) has non-negative solution. Then we can define M as k * LCM(S), since any strings whose length larger than M can be decomposed as concatenation of words in S (thus in L').

Example calculation based on the proof

Suppose L' is set of all strings in L whose length is prime.

Let us construct set S = {a², a⁵}. S can be {a², a¹⁹} or {a⁵, a²³}, doesn't really matter. The final value of M might be different, but it doesn't affect the fact that (L')* is regular language.

We need to solve 10 equations (separately):

2a₁ + 5a₂ = 0 => (a₁, a₂) = (0, 0)
2a₁ + 5a₂ = 1 => (a₁, a₂) = (3, -1)
2a₁ + 5a₂ = 2 => (a₁, a₂) = (1, 0)
2a₁ + 5a₂ = 3 => (a₁, a₂) = (-1, 1)
2a₁ + 5a₂ = 4 => (a₁, a₂) = (2, 0)
2a₁ + 5a₂ = 5 => (a₁, a₂) = (0, 1)
2a₁ + 5a₂ = 6 => (a₁, a₂) = (3, 0)
2a₁ + 5a₂ = 7 => (a₁, a₂) = (1, 1)
2a₁ + 5a₂ = 8 => (a₁, a₂) = (4, 0)
2a₁ + 5a₂ = 9 => (a₁, a₂) = (2, 1)

Add one LCM(2,5) = 10. Note that we can modify the solution directly without solving again, due to the property of LCM:

2a₁ + (5a₂ + 10) = 1 + 10 => (a₁, a₂) = (3, 1)
(2a₁ + 10) + 5a₂ = 3 + 10 => (a₁, a₂) = (4, 1)

Since all the solutions are non-negative, and we only add one LCM(2,5), M = 10.

The regular expression for (L')* can be constructed:

a²+a⁴+a⁵+a⁶+a⁷+a⁸+a⁹+a¹⁰a*

The regular expression is not very compact, but it is not our concern here. For the sake of the proof, we only need to know there exists a number M so that aⁿ belongs to (L')* for all n >= M, which implies that there are finite number of states and a DFA can be constructed.

Upvotes: 1