Reputation: 31
I have a CSV file, which contains a data matrix. The first column of this matrix contains a label and the other columns contain values, which are associated to the label (i.e. to the first column). Now I want to read this CSV file and put the data into a Map[String,Array[String]] in Scala. The key of the Map should be the label (this in the first column) and the Map values should be the other values (these one in the rest of the columns). To read the CSV file I use opencsv.
val isr: InputStreamReader = new InputStreamReader(getClass.getResourceAsStream("test.csv"))`
val data: IndexedSeq[Array[String]] = new CSVReader(isr).readAll.asScala.toIndexedSeq`
Now I have all data in an IndexedSeq[Array[String]]
. Can I use this functional way here or should I better chose an iterative way, because it can get complex to read all data at once? Well, now I need to create the Map from this IndexedSeq. Therefor I map the IndexedSeq to an IndexedSeq of Tupel[String,Array[String]]
to seperate the label value from the rest of the values and then I create the Map from this.
val result: Map[String, Array(String) = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail)).toMap
This works for small examples but when I use it to read the content of my CSV file it throws a java.lang.RuntimeException. I also tried to create the map with a groupBy or to create several Maps (one for each line) and to reduce them afterwards to one big Map, but without success. I also read another post on stackoverflow and somebody assumes that toMap has a complexity of O(n²). I got this at the end of my StackTrace (whole Stacktrace is quite long).
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.runSingleTest(JavaSpecs2Runner.java:130)
at org.jetbrains.plugins.scala.testingSupport.specs2.JavaSpecs2Runner.main(JavaSpecs2Runner.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.RuntimeException: can not create specification: com.test.MyClassSpec
at scala.sys.package$.error(package.scala:27)
at org.specs2.specification.SpecificationStructure$.createSpecification(BaseSpecification.scala:96)
at org.specs2.runner.ClassRunner.createSpecification(ClassRunner.scala:64)
at org.specs2.runner.ClassRunner.start(ClassRunner.scala:35)
at org.specs2.runner.ClassRunner.main(ClassRunner.scala:28)
at org.specs2.runner.NotifierRunner.main(NotifierRunner.scala:24)
... 11 more
Process finished with exit code 1
Does anybody know another way to create a Map from the data in a CSV file?
Upvotes: 3
Views: 20277
Reputation: 1143
Not quite what you asked for but here's how to do it using my own dogfood:
val data = CsvParser[String,Int,Double].parseFile("sample.csv")
data: org.catch22.collections.immutable.CollSeq3[String,Int,Double] =
CollSeq((Jan,10,22.33),
(Feb,20,44.2),
(Mar,25,55.1))
scala> val lookup=(data._1 zip data).toMap
lookup: scala.collection.immutable.Map[String,Product3[String,Int,Double]] = Map(Jan -> (Jan,10,22.33), Feb -> (Feb,20,44.2), Mar -> (Mar,25,55.1))
scala> lookup("Feb")
res0: Product3[String,Int,Double] = (Feb,20,44.2)
Upvotes: 0
Reputation: 8932
This worked for me:
import scala.io.Source
Source.fromFile("some_very_big_file").getLines.map(_.split(";")).count(_ => true)
The split
Breaks up each line of the CSV file in simple records. The count
is only there to check if the file is really read.
So now we can use this to read in a real CSV file (although I only tested it with a small file):
scala> val content=Source.fromFile("test.csv").getLines.map(_.split(";"))
content: Iterator[Array[java.lang.String]] = non-empty iterator
scala> val header=content.next
header: Array[java.lang.String] = Array(Elements, Duration)
scala> content.map(header.zip(_).toMap)
res40: Iterator[scala.collection.immutable.Map[java.lang.String,java.lang.String]] = non-empty iterator
This works quite well with simple CSV files. If you have more complex ones (e.g. entries spilt over several lines), you might have to use a more complex CSV parser (e.g. Apache Commons CSV. But usually sucha aperser will also give you some kind of iterator and you can use the same map(... zip ...)
function on it.
Upvotes: 10
Reputation: 35453
You could skip the intermediary List
of tuple and just build the map directly like this:
val result: Map[String, Array[String]] = data.filter(e => !e.isEmpty).map(e => (e.head,e.tail))(collection.breakOut)
Not sure if this will fix your issue though, but you did ask if there was another way to build the map. You can read more about collection.breakOut
here:
Scala: List[Tuple3] to Map[String,String]
Upvotes: 1