amarchin
amarchin

Reputation: 2124

Add a new line to a text file in Spark

I have read a text file in Spark using the command

val data = sc.textFile("/path/to/my/file/part-0000[0-4]")

I would like to add a new line as a header of my file. Is there a way to do that without turning the RDD into an Array?

Thank you!

Upvotes: 6

Views: 8739

Answers (2)

Daniel Darabos
Daniel Darabos

Reputation: 27455

"Part" files are automatically handled as a file set.

val data = sc.textFile("/path/to/my/file") // Will read all parts.

Just add the header and write it out:

val header = sc.parallelize(Seq("...header..."))
val withHeader = header ++ data
withHeader.saveAsTextFile("/path/to/my/modified-file")

Note that because this has to read and write all the data, it will be quite a bit slower than what you may intuitively expect. (After all you're just adding a single new line!) For this reason and others, you may be better off not adding this header, and instead storing the metadata (list of columns) separately from the data.

Upvotes: 2

Odomontois
Odomontois

Reputation: 16308

You can not actually control whether new line will be first (header) or not, but you can create new singleton RDD and merge it with existent:

val extendedData = data ++  sc.makeRDD(Seq("my precious new line"))

so

extendedData.filter(_ startsWith "my precious").first() 

will probably prove your line is added

Upvotes: 1

Related Questions