John Thomas
John Thomas

Reputation: 222

How to create a Dataframe from a String?

I have a String like below , each line separated by new line and fields with spaces. The first row is my header .

col1 col2 col3 col4 col5 col6 col7 col8
val1 val2 val3 val4 val5 val6 val7 val8
val9 val10 val11 val12 val13 val14 val15 val16
val17 val18 val19 val20 val21 val22 val23 val24

How can i build a Spark DataFrame from String in Java?

Upvotes: 2

Views: 2905

Answers (2)

Leo C
Leo C

Reputation: 22449

I believe @Shankar Koirala has already provided a solution in Java by treating the text/string file as a CSV file (with custom separator " " instead of ","). Below is a Scala-equivalence of the same approach:

val spark = org.apache.spark.sql.SparkSession.builder.
  master("local").
  appName("Spark custom CSV").
  getOrCreate

val df = spark.read.
  format("csv").
  option("header", "true").
  option("delimiter", " ").
  csv("/path/to/textfile")

df.show
+-----+-----+-----+-----+-----+-----+-----+-----+
| col1| col2| col3| col4| col5| col6| col7| col8|
+-----+-----+-----+-----+-----+-----+-----+-----+
| val1| val2| val3| val4| val5| val6| val7| val8|
| val9|val10|val11|val12|val13|val14|val15|val16|
|val17|val18|val19|val20|val21|val22|val23|val24|
+-----+-----+-----+-----+-----+-----+-----+-----+

[UPDATE] Create DataFrame from string content

val s: String = """col1 col2 col3 col4 col5 col6 col7 col8
                  |val1 val2 val3 val4 val5 val6 val7 val8
                  |val9 val10 val11 val12 val13 val14 val15 val16
                  |val17 val18 val19 val20 val21 val22 val23 val24
|"""

// remove header line
val s2 = s.substring(s.indexOf('\n') + 1)

// create RDD
val rdd = sc.parallelize( s2.split("\n").map(_.split(" ")) )

// create DataFrame
val df = rdd.map{ case Array(c1, c2, c3, c4, c5, c6, c7, c8) => (c1, c2, c3, c4, c5, c6, c7, c8) }.
  toDF("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col8")

df.show
+-----+-----+-----+-----+-----+-----+-----+-----+
| col1| col2| col3| col4| col5| col6| col7| col8|
+-----+-----+-----+-----+-----+-----+-----+-----+
| val1| val2| val3| val4| val5| val6| val7| val8|
| val9|val10|val11|val12|val13|val14|val15|val16|
|val17|val18|val19|val20|val21|val22|val23|val24|
+-----+-----+-----+-----+-----+-----+-----+-----+

Upvotes: 2

koiralo
koiralo

Reputation: 23119

You can read csv file in spark Java API as follows: Creating spark session

SparkSession spark = SparkSession.builder()
  .master("local[*]")
  .appName("Example")
  .getOrCreate();

//read file with header true and delimiter as " " (space)
DataFrame df = spark.read
    .option("delimiter", " ")
    .option("header", true)
    .csv("path to file");
df.show();

Upvotes: 0

Related Questions