Process large text file using Zeppelin and Spark

Question

I'm trying to analyze(visualize actually) some data from large text file(over 50 GB) using Zeppelin (scala). Examples from the web use csv files with known header and datatypes of each column. In my case, I have lines of a pure data with " " delimiter. How do I achive putting my data into DataFrame like in the code below?:

case class Record()

val myFile1 = myFile.map(x=>x.split(";")).map {
  case Array(id, name) => Record(id.toInt, name)
} 

myFile1.toDF() // DataFrame will have columns "id" and "name"

P.S. I want dataframe with columns "1","2"... thx

user6022341 · Accepted Answer

You can use csv:

spark.read.option("delimiter", ";").csv(inputPath)

Process large text file using Zeppelin and Spark

Answers (1)

Related Questions