MrGildarts
MrGildarts

Reputation: 833

data lost when converting DStream to Dataframe

I have an issue when i try to convert my DStream[String] into Dataframes.

My goal is to transform the twitter stream[rdd] into dataframes, but with my code (below) the transformation doesn't work, at the end i receive i dataframe with only one word.

For example :hi every body

my dataframe will contain only the words "hi"

here the piece of the code

val splited_test=texts.transform(rdd => rdd.map(x=> Row.fromSeq(x.split(" "))))


    splited_test.foreachRDD { rdd =>{

      val fields = new Array[StructField](1)
      fields(0)=(DataTypes.createStructField("text", StringType, true))
      val schema = DataTypes.createStructType(fields)
      val df= sqlContext.createDataFrame(rdd, schema)
}}

Upvotes: 0

Views: 581

Answers (1)

lucas kim
lucas kim

Reputation: 910

Only the first word is stored because you used x.split (" ").

You created one field.

Modify the code as follows.

val splited_test=texts.transform(rdd => rdd.map(x=> Row.fromSeq(Seq(x))))

Upvotes: 1

Related Questions