Reputation: 28129
I have 2 files in HDFS - one is a csv file with no header and one is a list of column names. I'm wondering if it's possible to assign the column names to the other data frame without actually typing them out like described here.
I'm looking for something like this:
val df = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", "\t").load("/user/training_data.txt")
val header = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", ",").load("/user/col_names.txt")
df.columns(header)
Is this possible?
Upvotes: 1
Views: 1331
Reputation: 646
One way could be to read the header file using scala.io
like this:
import scala.io.Source
val header = Source.fromFile("/user/col_names.txt").getLines.map(_.split(","))
val newNames = header.next
Then, read the CSV file using spark-csv
as you do, specifying no header and converting the names like:
val df = spark.read.format("com.databricks.spark.csv")
.option("header", "false").option("delimiter", "\t")
.load("/user/training_data.txt").toDF(newNames: _*)
notice the _*
type annotation.
The _* is type ascription in Scala (meaning that we can give a list as argument, and it will still work, applying the same function to each member of the-said list)
more here: What is the purpose of type ascriptions in Scala?
Upvotes: 2