Reputation: 123
Objective:
Given a string of the form val strang = "ab(cde(fgh)ijk)lmn)opq)rs"
First I want to construct a vector of the string split by the "(" and ")" characters. This is accomplished with,
val regexPattern = "((?<=\\()|(?=\\())|((?<=\\))|(?=\\)))"
val data: Vector[String] = strang .split(regexPattern ).map(_.toString).toVector
Note that the actual string will be much more varied, but how it's split stands.
Secondly and this is the hard part that I need help with. I want to iterate over the data vector and construct a nested vector with the following logic (below is psudo-codish)
val newVector =
for(row <- data) {
if(row == "(") // opening angle bracket
start constructing a nested vector inside of newVector
keep appending to the nested vector until we encounter a ")"
character (closing parenthesis).
IF before encountering a ")" we encounter another "(" then
create another nestedVector inside the previous nested vector
and keep appending to it until a closing parenthesis ")" is
encountered
else if(row == ")")
if we encounter a closing parenthesis, go up one level of the nested
vectors
else
simply append this row to the new vector
newVector :+ row
}
I have made quite a few attempts at this with limited success. Using a recursive function I manage just one level of nesting but to go beyond that I end up using while loops which seems like overkill. The nested vectors can be of a special type. I have tried for example case class Rowy(row: String, rows: Vector[Vector[String]]) or case class Rowy(row: String, rows: Vector(Rowy))
I'm relatively new to scala and am wondering if there is an elegant way to approach this using scanLeft or foldLeft methods. Any help would be appreciated.
PS: I don't use an actual for loop to iterate over the "data" vector but rather a recursive function. The for loop is just to convey that an iteration is taking place. Any help would be appreciated.
Upvotes: 1
Views: 886
Reputation: 127721
In your example, parentheses in the string are not balanced, and it is not really clear from your algorithm definition how it should be handled. If we assume that the input is actually well-balanced, then your native data structure would be a tree, with branches being one level of parentheses and leaves being the "words". Then the algorithm you need is just a simple stack-based tree construction. In the following example I use an explicit stack, but you can rewrite the algorithm to be recursive, and then the stack will be implicit.
// Tree structure definition
sealed trait Data
object Data {
case class Leaf(data: String) extends Data
case class Branch(parts: Vector[Data]) extends Data {
// a convenience method to append data to a branch more easily
def :+(data: Data): Branch = copy(parts = parts :+ data)
}
}
object Main extends App {
val regexPattern = "((?<=\\()|(?=\\())|((?<=\\))|(?=\\)))"
val string = "ab(cde(fgh)ijk)lmn" // now the input is balanced
val data: Vector[String] = string.split(regexPattern).toVector
println(data)
var stack: List[Data.Branch] = Data.Branch(Vector.empty) :: Nil
for (part <- data) {
part match {
case "(" =>
stack = Data.Branch(Vector.empty) :: stack
case ")" =>
val top :: parent :: rest = stack
stack = (parent :+ top) :: rest
case _ =>
stack = (stack.head :+ Data.Leaf(part)) :: stack.tail
}
}
val result = stack.head
println(result)
}
It outputs this:
Vector(ab, (, cde, (, fgh, ), ijk, ), lmn)
Branch(Vector(Leaf(ab), Branch(Vector(Leaf(cde), Branch(Vector(Leaf(fgh))), Leaf(ijk))), Leaf(lmn)))
If we reformat that a bit, we get
Branch(Vector(
Leaf(ab),
Branch(Vector(
Leaf(cde),
Branch(Vector(
Leaf(fgh)
)),
Leaf(ijk)
)),
Leaf(lmn)
))
which, as you can see, mirrors the structure inside your string.
If your input could have unbalanced parentheses, then you might need to add additional checks to the loop in the case ")"
part. Right now, if the input is not balanced, a MatchError
would be thrown because the stack
variable would not have the required number of elements.
Upvotes: 1