SomeGuy
SomeGuy

Reputation: 265

Parse the CSV data which is available in single row

I have below like data in dataframe. Note that - Contents is the only one column and this dataframe has only one record which has the data. In data, first row is header, lines are separated by LF.

How can I generate a new dataframe which will have 3 columns and corresponding data.

display(df)

Contents
============================
"DateNum","MonthNum","DayName"
"19910101","1","Tue"
"19910102","1","Wed"
"19910103","1","Thu"

Just for info, below is how the data looks

1

Upvotes: 1

Views: 377

Answers (1)

mck
mck

Reputation: 42422

You can split by new line to get an RDD[String], which can then be converted to a dataframe:

val df2 = spark.read.option("header",true).csv(df.rdd.flatMap(_.getString(0).split("\n")).toDS)

df2.show
+--------+--------+-------+
| DateNum|MonthNum|DayName|
+--------+--------+-------+
|19910101|       1|    Tue|
|19910102|       1|    Wed|
|19910103|       1|    Thu|
+--------+--------+-------+

Upvotes: 2

Related Questions