J.doe
J.doe

Reputation: 373

Algorithm for extracting a language from DFA

I am trying to find an algorithm to extract a language from a DFA. If that language is infinite then I only need to report up to : maxcount <= 1000 strings. there are no restrictions on what order the are reported.

I have tried:(I don't know how to write algorithms so I just explain in words what I did)

a recursive algorithm - for each transition from one state to the next state/states save the transition character and then make a recursive call for each transition,if that transition was to an accepting state then report .if we have reached a "dead end" then there is noting to recurse on.

suppose the start state is state 1 and there are two transitions to: state 2 and state 3 with transition characters 'a','b' respectively. Then there is a transition from state 3 to state 4 with transition character 'c', only state 4 is an accepting state.

going from state 1 to state 2 will give save 'a' as transition character , but since state 2 is a dead end there will be noting to recurse on and since it is not an accepting state then there is noting to report.

Going from state 1 to state 3 will save 'b' and then go from state 3 to state 4 will save 'c' so that we have a total of "bc" and since state 4 does not have any transitions the recursion ends. We report "bc" because state 4 is an accepting state


If I have DFA with a loop - then in order to not get "stuck" in that loop I have made sure that "if" there is a another possible transition then we will always pick that transition and not the transition we made last time/times we where at that state ( a kind of memory for which transition we made at which states )

The algorithm works for small DFAs but it will give a Stack overflow on bigger DFA:s (imagine 20 transitions from state 1 to state 2 and 30 transitions from state 2 to state 3 and so on ).

Can anybody recommend more efficient algorithm?

Upvotes: 2

Views: 1276

Answers (4)

Gene
Gene

Reputation: 46990

If you use a BFS, then cycles don't matter. The idea is to define search nodes that contain the current state and a pointer to the predecessor node. Therefore when you visit a node for an accepting state, you can trace the previous pointers backward to determine the accepted string (in reverse). It turns out that its elegant if a search node also contains the character that caused the transition from the previous node's state to the current one.

Here's an approach in Java:

import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

class Experimental {
  // DFA state with its transitions, possibly accepting.
  static class State {
    final Map<Character, State> transitions = new HashMap<>();
    final boolean accept;    
    State(boolean accept) {
      this.accept = accept;
    }
  }

  // A little DFA.
  static final State s0 = new State(false);
  static final State s1 = new State(false);
  static final State s2 = new State(true);
  static final State s3 = new State(true);
  static {
    s0.transitions.put('a', s1);
    s0.transitions.put('b', s2);
    s0.transitions.put('c', s3);
    s1.transitions.put('d', s3);
    s2.transitions.put('e', s0);
    s2.transitions.put('f', s1);
  }

  // An enumerator of strings accepted by the DFA in order of length.
  static class Enumerator {
    static class Node {
      final Node prev;
      final char prevCh;
      final State state;
      Node(State start) {
        this(null, Character.MIN_VALUE, start);
      } 
      Node(Node prev, char ch, State state) {
        this.prev = prev;
        this.prevCh = ch;
        this.state = state;
      }
    }

    final Deque<Node> queue = new ArrayDeque<>();
    final List<String> output = new ArrayList<>();
    final State start;

    Enumerator(State start) {
      this.start = start;
    }

    Enumerator enumerate(int outputLimit) {
      queue.clear();
      output.clear();
      // Enqueue a search node for the start state.
      queue.add(new Node(start));
      while (!queue.isEmpty() && output.size() < outputLimit) {
        Node current = queue.pollFirst();
        if (current.state.accept) {
          // Follow prev pointers to build the accepted string.
          StringBuilder sb = new StringBuilder();
          for (Node p = current; p.prev != null; p = p.prev) {
            sb.append(p.prevCh);
          }
          output.add(sb.reverse().toString());
        }
        // Enqueue nodes for the successors of current state.
        for (Entry<Character, State> transition : current.state.transitions.entrySet()) {
          queue.addLast(new Node(current, transition.getKey(), transition.getValue()));
        }
      }
      return this;
    }
  }

  public static void main(String[] args) {
    System.out.println(new Enumerator(s0).enumerate(20).output);
  }
}

Output:

[b, c, ad, beb, bec, bfd, bead, bebeb, bebec, bebfd, bebead, bebebeb, bebebec, bebebfd, bebebead, bebebebeb, bebebebec, bebebebfd, bebebebead, bebebebebeb]

Upvotes: 1

rici
rici

Reputation: 241891

I'd do this with a breadth-first search over the DFA, which will produce strings ordered by length.

Define an object consisting of a state and a string (there is a more memory-efficient solution here, but I think that if you only need to produce 1000 strings, this will be fine). Then create a work-queue of such objects. Initialize the workqueue with a single object whose state is the start state and whose string is empty.

Now repeat the following three steps until you have found maxcount strings or the queue becomes empty:

  1. Remove the first object in the queue.

  2. If its state is an accepting state, output its string.

  3. For each possible out transition (which consists of a character and a new state), add to the end of the queue a new object with the transition's state and the concatenation of the object's string with the transition's character.

Upvotes: 1

djechlin
djechlin

Reputation: 60808

  1. Run a cycle detection algorithm.
  2. If there are no cycles, run any path finding algorithm (BFS, DFS, Dijkstra, A*, etc) to list all paths from start to finish.
  3. If there is a cycle run pathfinding to find a path from the start node to the cycle. Then output (start node to the cycle) + (the cycle starting at the connecting node) * N for N = 1,2,3,...,1000.

Alternately:

  1. Convert to regex.
  2. Generate words from regex.

Upvotes: 1

Wasi Ahmad
Wasi Ahmad

Reputation: 37741

Note: I am guessing you can model this problem as finding all paths between start state and all accepting states. So, i believe this can be accomplished with a depth-first search of the graph. The depth-first search will find all non-cyclical paths between two nodes.

If you are familiar with Depth First Search (DFS) algorithm which is a graph search/traversal algorithm can be helpful in your problem. Let me give you a very simple example.

Given a directed graph, a source vertex ‘s’ and a destination vertex ‘d’, print all paths from given ‘s’ to ‘d’. Consider the following directed graph. Let the s be 2 and d be 3. There are 4 different paths from 2 to 3.

enter image description here

The idea is to do Depth First Traversal of given directed graph. Start the traversal from source. Keep storing the visited vertices in an array say path[]. If we reach the destination vertex, print contents of path[]. The important thing is to mark current vertices in path[] as visited also, so that the traversal doesn’t go in a cycle.

Sample Code: You can find a very simple code in Java here to find all paths between two nodes in a graph.

Upvotes: 0

Related Questions