Add column names to data read from csv file without column names

Question

I am using Apache Spark with Scala.

I have a csv file that does not have column names in the first row. It's like this:

28,Martok,49,476
29,Nog,48,364
30,Keiko,50,175
31,Miles,39,161

The columns represent ID, name, age, numOfFriends.

In my Scala object, I am creating dataset using SparkSession from csv file as follows:

val spark = SparkSession.builder.master("local[*]").getOrCreate()
val df = spark.read.option("inferSchema","true").csv("../myfile.csv")
df.printSchema()

When I run the program, the result is:

|-- _c0: integer (nullable = true)
|-- _c1: string (nullable = true)
|-- _c2: integer (nullable = true)
|-- _c3: integer (nullable = true)

How can I add names to the columns in my dataset?

Leo C · Accepted Answer

You can use toDF to specify column names when reading the CSV file:

val df = spark.read.option("inferSchema","true").csv("../myfile.csv").toDF(
  "ID", "name", "age", "numOfFriends"
)

Or, if you already have the DataFrame created, you can rename its columns as follows:

val newColNames = Seq("ID", "name", "age", "numOfFriends")
val df2 = df.toDF(newColNames: _*)

Add column names to data read from csv file without column names

Answers (2)

Related Questions