Placid
Placid

Reputation: 1448

Add column names to data read from csv file without column names

I am using Apache Spark with Scala.

I have a csv file that does not have column names in the first row. It's like this:

28,Martok,49,476
29,Nog,48,364
30,Keiko,50,175
31,Miles,39,161

The columns represent ID, name, age, numOfFriends.

In my Scala object, I am creating dataset using SparkSession from csv file as follows:

val spark = SparkSession.builder.master("local[*]").getOrCreate()
val df = spark.read.option("inferSchema","true").csv("../myfile.csv")
df.printSchema()

When I run the program, the result is:

|-- _c0: integer (nullable = true)
|-- _c1: string (nullable = true)
|-- _c2: integer (nullable = true)
|-- _c3: integer (nullable = true)

How can I add names to the columns in my dataset?

Upvotes: 10

Views: 18849

Answers (2)

padmaja ramesh
padmaja ramesh

Reputation: 11

toDf           

method can be used, where you can pass in the column name in spark java.

Example:

Dataset<Row> rowsWithTitle = sparkSession.read().option("header", "true").option("delimiter", "\t").csv("file").toDF("h1", "h2");

Upvotes: 1

Leo C
Leo C

Reputation: 22449

You can use toDF to specify column names when reading the CSV file:

val df = spark.read.option("inferSchema","true").csv("../myfile.csv").toDF(
  "ID", "name", "age", "numOfFriends"
)

Or, if you already have the DataFrame created, you can rename its columns as follows:

val newColNames = Seq("ID", "name", "age", "numOfFriends")
val df2 = df.toDF(newColNames: _*)

Upvotes: 26

Related Questions