Tong
Tong

Reputation: 539

How to merge two text files and convert it to csv file in Scala

I use the following code to export a DataFrame :

df.select("A", "b", "C", "D","E")
  .write.format("com.databricks.spark.csv")
  .save("newiris.csv")

I get two text files as following :

part-00000

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa

part-00001

6.7,3,5,1.7,Iris-versicolor
6,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor

Now I want to have them combined to one file like

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
6.7,3,5,1.7,Iris-versicolor
6,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor

And then convert it to CSV. How can I do this in Scala?

Upvotes: 1

Views: 4609

Answers (1)

Brian
Brian

Reputation: 20285

The necessary Scala bits here are scala.io.Source to read the file and get the lines, ++ to append part0-00000 and part-00001, and a foreach loop to go through the combined data and write to a file. File I/O is the same as in Java.

scala> import java.io._

scala> import scala.io.Source

scala> val part0 = Source.fromFile("part-00000.txt").getLines
part0: Iterator[String] = non-empty iterator

scala> val part1 = Source.fromFile("part-00001.txt").getLines
part1: Iterator[String] = non-empty iterator

scala> val part2 = part0.toList ++ part1.toList
part2: List[String] = List(5.1,3.5,1.4,0.2,Iris-setosa, 4.9,3,1.4,0.2,Iris-setosa, 4.7,3.2,1.3,0.2,Iris-setosa, 4.6,3.1,1.5,0.2,Iris-setosa, 5,3.6,1.4,0.2,Iris-setosa, 5.4,3.9,1.7,0.4,Iris-setosa, 6.7,3,5,1.7,Iris-versicolor, 6,2.9,4.5,1.5,Iris-versicolor, 5.7,2.6,3.5,1,Iris-versicolor, 5.5,2.4,3.8,1.1,Iris-versicolor, 5.5,2.4,3.7,1,Iris-versicolor, 5.8,2.7,3.9,1.2,Iris-versicolor)

scala> val part00002 = new File("part-00002")
part00002: java.io.File = part-00002

scala> val bw = new BufferedWriter(new FileWriter(part00002))
bw: java.io.BufferedWriter = java.io.BufferedWriter@56826a75

scala> part2.foreach(p => bw.write(p + "\n"))


scala> bw.close

Inspect the file:

brian:/tmp/ $ cat part-00002                                                            
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
6.7,3,5,1.7,Iris-versicolor
6,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor

Upvotes: 2

Related Questions