Reputation: 385
I am trying to parse a text file. My input file looks like this:
ID: 12343-7888
Name: Mary, Bob, Jason, Jeff, Suzy
Harry, Steve
Larry, George
City: New York, Portland, Dallas, Kansas City
Tampa, Bend
Expected output would:
“12343-7888”
“Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George”
“New York, Portland, Dallas, Kansas City, Tampa, Bend"
Note the “Name” and "City" fields have new lines or returns in them. I have this code below, but it is not working. The second line of code places each character in a line. Plus, I am having troubles only grabbing the data from the field, like only returning the actual names, where the “Name: “ is not part of the results. Also, looking to put quotes around each return field.
Can you help fix up my problems?
val lines = Source.fromFile("/filesdata/logfile.text").getLines().toList
val record = lines.dropWhile(line => !line.startsWith("Name: ")).takeWhile(line => !line.startsWith("Address: ")).flatMap(_.split(",")).map(_.trim()).filter(_.nonEmpty).mkString(", ")
val final results record.map(s => "\"" + s + "\"").mkString(",\n")
How can I get my results that I am looking for?
Upvotes: 1
Views: 1186
Reputation: 44957
SHORT ANSWER
A two-liner that produces a string that looks exactly as you specified:
println(lines.map{line => if(line.trim.matches("[a-zA-Z]+:.*"))
("\"\n\"" + line.split(":")(1).trim) else (", " + line.trim)}.mkString.drop(2) + "\"")
LONG ANSWER
Why try to solve something in one line, if you can achieve the same thing in 94?
(That's the exact opposite of the usual slogan when working with Scala collections, but the input was sufficiently messy that I found it worthwhile to actually write out some of the intermediate steps. Maybe that's just because I've bought a nice new keyboard recently...)
val input = """ID: 12343-7888
Name: Mary, Bob, Jason, Jeff, Suzy
Harry, Steve
Larry, George
City: New York, Portland, Dallas, Kansas City
Tampa, Bend
ID: 567865-676
Name: Alex, Bob
Chris, Dave
Evan, Frank
Gary
City: Los Angeles, St. Petersburg
Washington D.C., Phoenix
"""
case class Entry(id: String, names: List[String], cities: List[String])
def parseMessyInput(input: String): List[Entry] = {
// just a first rought approximation of the structure of the input
sealed trait MessyInputLine { def content: String }
case class IdLine(content: String) extends MessyInputLine
case class NameLine(content: String) extends MessyInputLine
case class UnlabeledLine(content: String) extends MessyInputLine
case class CityLine(content: String) extends MessyInputLine
val lines = input.split("\n").toList
// a helper function for checking whether a line starts with a label
def tryParseLabeledLine
(label: String, line: String)
(cons: String => MessyInputLine)
: Option[MessyInputLine] = {
if (line.startsWith(label + ":")) {
Some(cons(line.drop(label.size + 1)))
} else {
None
}
}
val messyLines: List[MessyInputLine] = for (line <- lines) yield {
(
tryParseLabeledLine("Name", line){NameLine(_)} orElse
tryParseLabeledLine("City", line){CityLine(_)} orElse
tryParseLabeledLine("ID", line){IdLine(_)}
).getOrElse(UnlabeledLine(line))
}
/** Combines the content of the first line with the content
* of all unlabeled lines, until the next labeled line or
* the end of the list is hit. Returns the content of
* the first few lines and the list of the remaining lines.
*/
def readUntilNextLabel(messyLines: List[MessyInputLine])
: (List[String], List[MessyInputLine]) = {
messyLines match {
case Nil => (Nil, Nil)
case h :: t => {
val (unlabeled, rest) = t.span {
case UnlabeledLine(_) => true
case _ => false
}
(h.content :: unlabeled.map(_.content), rest)
}
}
}
/** Glues multiple lines to entries */
def combineToEntries(messyLines: List[MessyInputLine]): List[Entry] = {
if (messyLines.isEmpty) Nil
else {
val (idContent, namesCitiesRest) = readUntilNextLabel(messyLines)
val (namesContent, citiesRest) = readUntilNextLabel(namesCitiesRest)
val (citiesContent, rest) = readUntilNextLabel(citiesRest)
val id = idContent.head.trim
val names = namesContent.map(_.split(",").map(_.trim).toList).flatten
val cities = citiesContent.map(_.split(",").map(_.trim).toList).flatten
Entry(id, names, cities) :: combineToEntries(rest)
}
}
// invoke recursive function on the entire input
combineToEntries(messyLines)
}
// how to use
val entries = parseMessyInput(input)
// output
for (Entry(id, names, cities) <- entries) {
println(id)
println(names.mkString(", "))
println(cities.mkString(", "))
}
Output:
12343-7888
Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George
New York, Portland, Dallas, Kansas City, Tampa, Bend
567865-676
Alex, Bob, Chris, Dave, Evan, Frank, Gary
Los Angeles, St. Petersburg, Washington D.C., Phoenix
You probably could write it down in one line, sooner or later. But if you write dumb code consisting of many simple intermediate steps, you don't have to think that hard, and there are no obstacles large enough to get stuck.
Upvotes: 2