How to aggregate cycles in a sequence?

Question

I have a vector of pairs representing hops between nodes which I would like to collapse when there are cycles (as in aggregate time between the hops in a cycle to display it as just one). So, for example, The path A --> B --> C --> D --> B --> C --> D --> E traverses the subpath B--> C --> D twice, so in my structure I would have something like:

(A,B,1)(B,C,3)(C,D,2)(D,B,4)(B,C,5)(C,D,8)(D,E,6)

which I would ideally reduce to:

(A,B,1)(B,C,3+5)(C,D,2+8)(D,E,6)

storing also the 4 from D to B (loop-back edge time) to aggregate separately and be able to display B --> C --> D in a condensed way (the aggregated edge times and an aggregated loop-back time for all the D-->B instances alongside a count of how many times we looped)

How can I go about this?

Nicolas · Accepted Answer

I would go for a suffix array or suffix tree. Just produce tokens (in this case (from, to) ) and put it in the Suffix Array or Suffix Tree. Then you can get the pairs. Simple but not efficietn way:

produce tokens, produce all suffixes of this tokenstream. sort them. and then you have all common subsequences in that sorted list near each other (tokenstream length - tokenstream suffix length = position) You could do the sorting with quicksort for example, or you just looking for a suffix array implemantation. Suffix Arrays could be constructed in O(n) and with O(n) space. And finding maximal pairs/repeats (thats what you want) can be done in O(n+k) (where n = tokennumber, and k = cycles in your list)

Perhaps this helps. Then you could produce a tokenstream like: ABCDBCDE

Quick and dirty solution is here

How to aggregate cycles in a sequence?

Answers (2)

Related Questions