Reputation: 886
Let L = {an | n >= 0}, where and for all n >= 1.
Thus L consists of sequences of a of all lengths, including a sequence of length 0. Let L2 be any infinite subset of L. I need to show there always exists a DFA to recognize (L2)*.
If L2 is a finite subset it is very obvious as L2 would be a DFA and hence by kleene closure L2* would be recognized by a DFA. But I am unable to get it for infinite subset as L2 may not be expressed as DFA for eg length of strings is prime.
Upvotes: 0
Views: 906
Reputation: 56809
While there exists a DFA to describe a set L of all strings an, n >= 0, there is no guarantee that a DFA exists for all subsets of L. The subset of L which contains all strings whose length is prime, as you have mentioned, is one example where a DFA the describes the language does not exist.
The correct approach would be to directly prove that (L')* is a regular language for any subset L' of L.
Let us define GCD(K) = GCDw ∈ K, |w| > 0 (|w|), where K is any non-empty subset of L. We can now refer to the greatest common divisor of all the lengths of all non-empty words in a language K as GCD(K). This definition applies for both finite and infinite subset of L.
Similarly, we can define LCM(K) = LCMw ∈ K, |w| > 0 (|w|), where K is any non-empty and finite subset of L.
We will try to prove that when GCD(L') = 1, there exists a number M such that all string an, n >= M belongs to the language (L')*. This leads to (L')* being a regular language, since we can construct a regular expression of the form:
All strings of length less than M and belongs to (L')*
OR
All strings of length more than or equal to M
The regular expression above has a corresponding DFA which has M + 1 states.
When GCD(L') > 1, we can reduce the problem to the case of GCD = 1 by "dividing" all words in the subset L' by GCD(L').
If GCD(L') = 1 (set-wise coprime), there exists a finite subset S of L' where GCD of the length of all strings in S is also 1.
We can prove the claim above by construction.
- Pick any element w1 from L', |w1| > 0 and construct set S1 = {w1}
- If GCD(Sn) = 1, Sn is the set we want to find.
- If GCD(Sn) > 1, pick an element wn+1 from L' and construct set Sn+1 = {wn+1} ∪ Sn, so that
GCD(Sn+1) < GCD(Sn)If GCD(Sn) > 1, there always exists an element from set L' that decreases the GCD when we add it to the set; otherwise, the GCD of the set L' cannot be 1. And since the length of the first element w1 has finite number of factors, the size of the final set S is finite.
Back to the problem, for any subset L' of L, we can find a finite subset S of L', which satisfies GCD(L') = GCD(S). From the set S, we can construct a generalized linear Diophantine equation with |S| unknowns ai:
a1|w1| + a2|w2| + ... + a|S||w|S|| = c where c is a non-negative integer
Since GCD(S) = 1, the equation above is always solvable, by recursively applying the solution to the simplest form of linear Diophantine equation ax + by = c.
Solve the generalized Diophantine equations above for c = 0 to (LCM(S) - 1). The solutions (a1, a2, ..., a|S|) can contain negative numbers. However, we can keep adding multiples of LCM(S) to both sides of the equations until all the solutions contain only non-negative numbers.
Let k be the smallest multiple of LCM(S) so that all Diophantine equations for c = k * LCM(S) + q, q = 0 to (LCM(S) - 1) has non-negative solution. Then we can define M as k * LCM(S), since any strings whose length larger than M can be decomposed as concatenation of words in S (thus in L').
Suppose L' is set of all strings in L whose length is prime.
Let us construct set S = {a2, a5}. S can be {a2, a19} or {a5, a23}, doesn't really matter. The final value of M might be different, but it doesn't affect the fact that (L')* is regular language.
We need to solve 10 equations (separately):
2a1 + 5a2 = 0 => (a1, a2) = (0, 0)
2a1 + 5a2 = 1 => (a1, a2) = (3, -1)
2a1 + 5a2 = 2 => (a1, a2) = (1, 0)
2a1 + 5a2 = 3 => (a1, a2) = (-1, 1)
2a1 + 5a2 = 4 => (a1, a2) = (2, 0)
2a1 + 5a2 = 5 => (a1, a2) = (0, 1)
2a1 + 5a2 = 6 => (a1, a2) = (3, 0)
2a1 + 5a2 = 7 => (a1, a2) = (1, 1)
2a1 + 5a2 = 8 => (a1, a2) = (4, 0)
2a1 + 5a2 = 9 => (a1, a2) = (2, 1)
Add one LCM(2,5) = 10. Note that we can modify the solution directly without solving again, due to the property of LCM:
2a1 + (5a2 + 10) = 1 + 10 => (a1, a2) = (3, 1)
(2a1 + 10) + 5a2 = 3 + 10 => (a1, a2) = (4, 1)
Since all the solutions are non-negative, and we only add one LCM(2,5), M = 10.
The regular expression for (L')* can be constructed:
a2+a4+a5+a6+a7+a8+a9+a10a*
The regular expression is not very compact, but it is not our concern here. For the sake of the proof, we only need to know there exists a number M so that an belongs to (L')* for all n >= M, which implies that there are finite number of states and a DFA can be constructed.
Upvotes: 1